String.prototype.normalize for safer string comparison
Written by Stefan Judis
- Published at
- Updated at
- Reading time
- 1min
This post is part of my Today I learned series in which I share all my web development learnings.
Today I discovered the String
method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.
Let's me show you a quick example:
// pick a random word with a German Umlaut
const word = 'über'; // displayed as 'über'
console.log(word.length); // 4
const alikeWord = 'u\u0308ber'; // displayed as 'über'
console.log(alikeWord.length); // 5
console.log(word === alikeWord); // false
As you see, strings that look identical can consist of different code points and units. alikeWord
makes use of a Combining Diacritical Mark to generate the German Umlaut ü
– specifically, it uses COMBINING DIAERESIS.
But here's the catch: the Umlaut ü
also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.
To solve this issue you can use normalize
to normalize strings.
const word = 'über'; // displayed as 'über'
console.log(word.length); // 4
const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length); // 4
console.log(word === alikeWord); // true
Was this TIL post helpful?
Yes? Cool! You might want to check out Web Weekly for more quick learnings. The last edition went out 24 days ago.
Yes? Cool! You might want to check out Web Weekly for more quick learnings. The last edition went out 24 days ago.