String.prototype.normalize for safer string comparison

1 min read

This post is part of my Today I learned series in which I share all my learnings regarding web development.

Today I came a across the String.prototype.normalize function which I haven't used before. It exists to make string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';         // displayed as 'über'
console.log( word.length );  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log( alikeWord.length ); // 5

console.log( word === alikeWord ); // false

As you see strings that look completely identical to us doesn't have to be the same internally. The string alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – to be specific it uses COMBINING DIAERESIS. The thing is that ü also has its own codepoint in Unicode. Now we have two ways to display this glyph which makes comparison a bit tricky.

To solve this issue we can use normalize to well... yeah normalize strings. ;)

const word = 'über';         // displayed as 'über'
console.log( word.length );  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log( alikeWord.length );            // 4

console.log( word === alikeWord ); // true
Load time