String.prototype.normalize for safer string comparison
- Published at
- Updated at
- Reading time
Today I discovered the
String method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.
Let's me show you a quick example:
// pick a random word with a German Umlaut const word = 'über'; // displayed as 'über' console.log(word.length); // 4 const alikeWord = 'u\u0308ber'; // displayed as 'über' console.log(alikeWord.length); // 5 console.log(word === alikeWord); // false
As you see, strings that look identical can consist of different code points and units.
alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut
ü – specifically, it uses COMBINING DIAERESIS.
But here's the catch: the Umlaut
ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.
To solve this issue you can use
normalize to normalize strings.
const word = 'über'; // displayed as 'über' console.log(word.length); // 4 const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über' console.log(alikeWord.length); // 4 console.log(word === alikeWord); // true
- "fetch" supports a "keepAlive" option to make it outlive page navigations
- A clipboard magic trick - how to use different MIME types with the Clipboard API
- Keyboard button clicks with Space and Enter behave differently
- VS Code supports JSDoc-powered type checking