Published at
Updated at
Reading time
1min
This post is part of my Today I learned series in which I share all my web development learnings.

Today I discovered the String.prototype.normalize method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log(alikeWord.length);   // 5

console.log(word === alikeWord); // false

As you see, strings that look identical can consist of different code points and units. alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – specifically, it uses COMBINING DIAERESIS.

But here's the catch: the Umlaut ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.

To solve this issue you can use normalize to normalize strings.

const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length);              // 4

console.log(word === alikeWord); // true
Was this TIL post helpful?
Yes? Cool! You might want to check out Web Weekly for more quick learnings. The last edition went out 24 days ago.

Related Topics

Related Articles

About the author

Stefan standing in the park in front of a green background

Frontend nerd with over ten years of experience, "Today I Learned" blogger, conference speaker, Tiny helpers maintainer, and DevRel at Checkly.