Published at
Updated at
Reading time
4min
This post is part of my Today I learned series in which I share all my web development learnings.

Regular expressions (regex) are tough. It always takes me a few minutes until I understand what a particular regular expression does. But nevertheless, there's no question about their usefulness.

Today, I had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.

There is so much useful information in these slides. Besides new language features such as async iterations, object spread properties and named capture groups in regular expressions they cover regular expression lookaheads (and the upcoming lookbehinds).

Occasionally, regular expression lookaheads cross my way, but I never had to use them, but as their counterpart lookbehinds are going to be in the language, too, I decided to read some documentation and finally learn what these regex lookaheads and lookbehind are.

Since publishing this post, lookahead and lookbehind assertions made it into all the major browser engines! 🎉 Browser support information is included in this post.

Regex lookaheads in JavaScript

You can define patterns that only match when they're followed or not followed by another pattern with lookaheads.

The MDN article about regular expressions describes two different types of lookaheads in regular expressions.

Positive and negative lookaheads:

  • x(?=y) – positive lookahead (matches 'x' when it's followed by 'y')
  • x(?!y) – negative lookahead (matches 'x' when it's not followed by 'y')

Captured groups in JavaScript – the similar-looking companions

Oh well... x(?=y) – that's a tricky syntax. What confused me initially is that I usually use () for captured or non-capturing groups in JavaScript expressions.

Let's look at an example of a captured group:

const regex = /\w+\s(\w+)\s\w+/;

regex.exec('eins zwei drei');
// ['eins zwei drei', 'zwei']
//                      /\
//                      ||
//                captured group
//                 defined with
//                    (\w+)

The regular expression above captures a word (zwei in this case) that is surrounded by spaces and another word on both ends.

Regular expression lookaheads are not like captured groups

Let's look at a typical example that you'll find when you read about lookaheads in JavaScript regular expressions.

// use positive regex lookahead
const regex = /Max(?= Mustermann)/;

regex.exec('Max Mustermann');
// ['Max']
regex.exec('Max Müller');
// null

This example matches Max whenever it is followed by a space and Mustermann, otherwise it's not matching and returns null. The interesting part is that it only matches Max and not the pattern defined in the lookahead ((?= Mustermann)).

This exclusion can seem weird after working with regular expressions but when you think of it, that's the difference of lookaheads and groups. Using lookaheads, you can test strings against patterns without including them in the resulting match.

MDN Compat Data (source)
Browser support info for Lookahead assertion: (?=...), (?!...)
chromechrome_androidedgefirefoxfirefox_androidsafarisafari_iossamsunginternet_androidwebview_android
111211111.51

The "Max Mustermann" example is not very useful, though, let's dive into positive and negative lookaheads with a real-world use case.

Positive regular expression lookaheads in JavaScript

Let's assume you have a long string of Markdown that includes a list of people and their food preferences. How would you figure out which people are vegan when everything's just a long string?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

// use positive regex lookahead
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
//                |----|  |-----------|
//                  /            \
//           more than one        \
//           word character      positive lookahead
//           but as few as       => followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Let's have a quick look at the regular expression and try to phrase it in plain language.

const regex = /-\s(\w+?)\s(?=\(vegan\))/g;

Match any dash followed by one space followed by one or more but as few as possible word characters (A-Za-z0-9_) followed by a space when everything is followed by the pattern "(vegan)".

Negative/negating regex lookaheads in JavaScript

On the other hand, how would you figure out who is not vegan?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

// use negative regex lookahead
const regex = /-\s(\w+)\s(?!\(vegan\))/g;
//                |---|  |-----------|
//                  /          \
//           more than one      \
//           word character     negative lookahead
//           but as few as      => not followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Bob
// Francis
// Elli

Let's have a quick look at the regular expression and try to phrase it in words, too.

const regex = /-\s(\w+)\s(?!\(vegan\))/g;

Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space character (which includes line breaks) when everything is not followed by the pattern "(vegan)".

Regular expression lookaheads company lookbehinds

Lookbehinds work the same way but for leading patterns. Lookaheads consider the patterns after the matching part whereas lookbehinds consider the patterns before.

MDN Compat Data (source)
Browser support info for Lookbehind assertion: (?<=...), (?<!...)
chromechrome_androidedgefirefoxfirefox_androidsafarisafari_iossamsunginternet_androidwebview_android
626279787816.416.48.062

When we flip the example strings around and adjust the regular expression to use lookbehinds, everything still works.

const people = `
- (vegetarian) Bob
- (vegan) Billa
- Francis
- (vegetarian) Elli
- (vegan) Fred
`;

// use positive regex lookbehind
const regex = /(?<=\(vegan\))\s(\w+)/g;
//             |------------|  |---|  
//                  /             \__
//         positive lookbehind        \
//       => following "(vegan)"     more than one
//                                  word character
//                                  but as few as possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Side note: I usually recommend RegExr for the fiddling with regular expressions, but lookbehinds are not supported yet.

Additional resources

If you're interested in more cutting edge features, have a look at Mathias' and Benedikt's slides on new features coming to JavaScript there is way more exciting stuff to come.

To remember the syntax for lookahead and lookbehinds I created a quick cheat sheet about it.

Was this TIL post helpful?
Yes? Cool! You might want to check out Web Weekly for more quick learnings. The last edition went out 11 days ago.
Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Related Topics

Related Articles