lookaheads (and lookbehinds) in JavaScript regular expressions

4 min read

This post is part of my Today I learned series in which I share all my learnings regarding web development.

Regular expressions are a challenge by themselves. For me it always takes a few minutes until I understand what a particular regular expression does but there is no question about their usefulness.

Today, I just had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.

There is so much useful information in these slides, and besides new language features like async iterations, object spread properties and named capture groups in regular expressions (๐ŸŽ‰) it also covers lookaheads (and the upcoming lookbehinds) in regular expressions.

Now and then lookaheads in JavaScript regular expressions cross my way, and I have to admit that I never had to use them but now the counter part lookbehinds are going to be in the language, too, so I decided to read some documentation and finally learn what these lookaheads are.

# Lookaheads in JavaScript

With lookaheads, you can define patterns that only match when they're followed or not followed by another pattern.

The MDN article about regular expressions describes two different types of lookaheads in regular expressions.

Positive and negative lookaheads:

  • x(?=y) โ€“ positive lookahead (matches 'x' when it's followed by 'y')
  • x(?!y) โ€“ negative lookahead (matches 'x' when it's not followed by 'y')

# Captured groups in JavaScript โ€“ the similar looking companions

Oh well... x(?=y) โ€“ that's a tricky syntax if you ask me. The thing that confused me initially is that I usually use () for captured groups in JavaScript expressions.

Let's look at an example for a captured group:

const regex = /\w+\s(\w+)\s\w+/;

regex.exec('eins zwei drei');
// ['eins zwei drei', 'zwei']
//                      /\
//                      ||
//                captured group
//                 defined with
//                    (\w+)

What you see above is a regular expression that captures a word (zwei in this case) that is surrounded by one space and another word.

# Lookaheads are not like captured groups

So let's look at a typical example that you'll find when you read about lookaheads in JavaScript regular expressions.

const regex = /Max(?= Mustermann)/

regex.exec('Max Mustermann')
// ['Max']
regex.exec('Max Mรผller')
// null

This example matches Max whenever it is followed by a space and Mustermann otherwise it's not matching and returns null. The interesting part for me is that it only matches Max and not the pattern that is defined in the lookahead. Which seems to be a weird after working with regular expressions for a while but when you think of it, that's the point of lookaheads.

The "Max Mustermann" example is not useful in my opinion so let's dive into positive and negative lookaheads with a real-world use case.

# Positive lookahead

Let's assume you have a long string of Markdown that includes a list of people and their food preferences. How would you figure out which people are vegan when everything's just a long string?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
//                |----|  |-----------|
//                  /            \
//           more than one        \
//           word character      positive lookahead
//           but as few as       => followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Let's have a quick look at the regular expression and try to phrase it in words.

const regex = /-\s(\w+?)\s(?=\(vegan\))/g;

Alright... let's do this!

Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space and the pattern "(vegan)"

# Negative/negating lookaheads

On the other hand, how would you figure out who is not vegan?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

const regex = /-\s(\w+)\s(?!\(vegan\))/g
//                |---|  |-----------|
//                  /          \
//           more than one      \
//           word character     negative lookahead
//           but as few as      => not followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Bob
// Francis
// Elli

Let's have a quick look at the regular expression and try to phrase it in words, too.

const regex = /-\s(\w+)\s(?!\(vegan\))/g

Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space character (which includes line breaks) not followed by the pattern "(vegan)"

# lookaheads will have company from lookbehinds soon

Lookbehinds will work the same way but for patterns before the matching pattern (lookaheads consider the patters after the matching part) and are already supported in Chrome today. They will also be available as positive lookbehind x(?<=y) and the negative lookbehind x(?<!y).

When we flip the strings in the example around it still works the same way using lookbehinds then. :)

const people = `
- (vegetarian) Bob
- (vegan) Billa
- Francis
- (vegetarian) Elli
- (vegan) Fred
`;

const regex = /(?<=\(vegan\))\s(\w+)/g
//             |------------|  |---|  
//                  /             \__
//         positive lookbehind        \
//       => following "(vegan)"     more than one
//                                  word character
//                                  but as few as possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Sidenote: I usually recommend RegExr for the fiddling with regular expressions but lookbehinds are not supported yet.

If you're interested in more cutting edge features have a look at Mathias' and Benedikt's slides on new features coming to JavaScript there is way more exciting stuff to come.

If you like this post share it or subscribe to my monthly newsletter.

Load time