Artboard 16light, inspiration, solution, idea, innovation,Google Sheets iconSwift icon
Published at
Updated at
Reading time
1min
This post is part of my Today I learned series in which I share all my web development learnings.

Today I discovered the String.prototype.normalize method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log(alikeWord.length);   // 5

console.log(word === alikeWord); // false

As you see, strings that look identical can consist of different code points and units. alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – specifically, it uses COMBINING DIAERESIS.

But here's the catch: the Umlaut ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.

To solve this issue you can use normalize to normalize strings.

const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length);              // 4

console.log(word === alikeWord); // true

Related Topics

Related Articles