Published at
Updated at
Reading time
1min

Today I discovered the String.prototype.normalize method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log(alikeWord.length);   // 5

console.log(word === alikeWord); // false

As you see, strings that look identical can consist of different code points and units. alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – specifically, it uses COMBINING DIAERESIS.

But here's the catch: the Umlaut ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.

To solve this issue you can use normalize to normalize strings.

const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length);              // 4

console.log(word === alikeWord); // true
If you enjoyed this article...

Join 6.3k readers and learn something new every week with Web Weekly.

Reply to this post and share your thoughts via good old email.
Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Related Topics

Related Articles