String.prototype.normalize for safer string comparison

Published at: Apr 17 2017
Updated at: Feb 07 2022
Reading time: 1min

This post is part of my Today I learned series in which I share all my web development learnings.

Today I discovered the String.prototype.normalize method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log(alikeWord.length);   // 5

console.log(word === alikeWord); // false

As you see, strings that look identical can consist of different code points and units. alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – specifically, it uses COMBINING DIAERESIS.

But here's the catch: the Umlaut ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.

To solve this issue you can use normalize to normalize strings.

const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length);              // 4

console.log(word === alikeWord); // true

Was this TIL post helpful?
Yes? Cool! You might want to check out Web Weekly for more quick learnings. The last edition went out 6 days ago.

Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

String.prototype.normalize for safer string comparison

About Stefan Judis

Related Topics

Related Articles