How to detect Emojis in JavaScript strings
- Published at
- Updated at
- Reading time
- 3min
When dealing with user-generated content, there's a high chance that you have to deal with strings full of Emojis. Emoji rendering can come with challenges, so you may want to detect when strings include Emojis and replace them with images.
Let's find out how to spot all these cute symbols!
There are Emoji edge cases when using the described Unicode property escapes. Make sure to read to the end of the article!
Luckily, JavaScript regular expressions come with a Unicode mode these days.
MDN describes that Unicode mode treats a regular expression pattern as a sequence of Unicode code points instead of code units.
There's more to it, though. When you enable Unicode mode in a regular expression, you can use Unicode property escapes. Unicode property escapes (\p{}
or \P{}
) allow you to match Unicode characters based on their properties and characteristics.
That's right; you can match currency symbols, non-Latin characters, and, you guessed it, Emojis!
Here's an example snippet:
const emojiRegex = /\p{Emoji}/u;
emojiRegex.test('โญ'); // true
// The capital 'p' negates the match
const noEmojiRegex = /\P{Emoji}/u;
noEmojiRegex.test('โญ'); // false
If you want to replace and alter Emojis in JavaScript strings, you can do that with String
, too.
// Note the 'g' flag to replace allEmojis
'๐โ๐โโญ'.replaceAll(/\p{Emoji}/ug, '_'); // '_โ_โ_'
The browser support for for Unicode property escapes looks pretty good, too! ๐
64 | 64 | 79 | 78 | 78 | 11.1 | 11.1 | 9.0 | 64 |
Unfortunately, as always, it's more complicated than that. Before going all in with \p{Emoji}
, let's dig deeper!
After publishing this blog post someone reached out to point out that \p{Emoji}
is matching digets and other characters, too. ๐ฒ
const emojiRegex = /\p{Emoji}/u;
emojiRegex.test('1'); // true
emojiRegex.test('*'); // true
emojiRegex.test('#'); // true
You propably don't want to include these codepoints in your Emoji detection because they're usually displayed as a normal text-based characters.
What counts as Emoji and what doesn't, then?
I'd say every tiny rendered comic icon counts, but unfortunately Emoji rendering depends on the operating system and the installed fonts. Just because you see a cute Emoji in front of you, it doesn't mean that someone else sees it, too.
And to make it more complicated: just because you see one rendered Emoji image, it doesn't mean that it's a single codepoint. It can be a combination of multiple Emojis and special characters.
If you have comments on Emojis detection in JavaScript, please give me a shoutout on Twitter or write me a good old email. I'm keen on learning more about it!
Mathias Bynes pointed out that there are shortcomings with this approach of Emoji detection. A property escape such as \p{Emoji}
matches every single Emoji code point and this can be a problem.
Let's have a look at an example:
"๐จโ๐ฉโ๐งโ๐ฆ".replaceAll(/\p{Emoji}/gu, '-'); // '----'
Various Emojis, such as the "Family" one, are rendered as a single symbol but consist of more than one code point. Unicode property escapes match every one of them so that you might run into unexpected behavior.
If you wonder what could count as an Emoji have a look at this extensive list.
There's a reason why Mathias' emoji-regex
package has 49 million weekly downloads, so make sure to check it out!
Join 5.1k readers and learn something new every week with Web Weekly.