Google Sheets iconSwift icon
Published at
Updated at
Reading time
4min

This post is part of my Today I learned series in which I share all my learnings regarding web development.

Regular expressions (regex) are a challenge by themselves. For me, it always takes a few minutes until I understand what a particular regular expression does. They're magical and there is no question about their usefulness.

Today, I just had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.

There is so much useful information in these slides. Besides new language features such as async iterations, object spread properties and named capture groups in regular expressions (🎉) they cover regular expression lookaheads (and the upcoming lookbehinds).

Now and then lookaheads in JavaScript regular expressions cross my way, and I have to admit that I never had to use them, but now the counterpart lookbehinds are going to be in the language, too, so I decided to read some documentation and finally learn what these regex lookaheads and lookbehind are.

Regex lookaheads in JavaScript

You can define patterns that only match when they're followed or not followed by another pattern with lookaheads.

The MDN article about regular expressions describes two different types of lookaheads in regular expressions.

Positive and negative lookaheads:

  • x(?=y) – positive lookahead (matches 'x' when it's followed by 'y')
  • x(?!y) – negative lookahead (matches 'x' when it's not followed by 'y')

Captured groups in JavaScript – the similar-looking companions

Oh well... x(?=y) – that's a tricky syntax if you ask me. What confused me initially is that I usually use () for captured groups or non-capturing groups in JavaScript expressions.

Let's look at an example of a captured group:

const regex = /\w+\s(\w+)\s\w+/;

regex.exec('eins zwei drei');
// ['eins zwei drei', 'zwei']
//                      /\
//                      ||
//                captured group
//                 defined with
//                    (\w+)

The regular expression above captures a word (zwei in this case) that is surrounded by spaces and another word.

Regular expression lookaheads are not like captured groups

Let's look at a typical example that you'll find when you read about lookaheads in JavaScript regular expressions.

// use positive regex lookahead
const regex = /Max(?= Mustermann)/;

regex.exec('Max Mustermann');
// ['Max']
regex.exec('Max Müller');
// null

This example matches Max whenever it is followed by a space and Mustermann otherwise it's not matching and returns null. The interesting part for me is that it only matches Max and not the pattern defined in the lookahead ((?= Mustermann)). This exclusion can seem weird after working with regular expressions but when you think of it, that's the difference of lookaheads and groups. Using lookaheads, you can test strings against patterns without including them in the resulting match.

The "Max Mustermann" example is not very useful, though, let's dive into positive and negative lookaheads with a real-world use case.

Positive regex lookahead in JavaScript

Let's assume you have a long string of Markdown that includes a list of people and their food preferences. How would you figure out which people are vegan when everything's just a long string?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

// use positive regex lookahead
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
//                |----|  |-----------|
//                  /            \
//           more than one        \
//           word character      positive lookahead
//           but as few as       => followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Let's have a quick look at the regular expression and try to phrase it in words.

const regex = /-\s(\w+?)\s(?=\(vegan\))/g;

Alright... let's do this!

Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space when everything is followed by the pattern "(vegan)"

Negative/negating regex lookaheads in JavaScript

On the other hand, how would you figure out who is not vegan?

const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;

// use negative regex lookahead
const regex = /-\s(\w+)\s(?!\(vegan\))/g;
//                |---|  |-----------|
//                  /          \
//           more than one      \
//           word character     negative lookahead
//           but as few as      => not followed by "(vegan)"
//           possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Bob
// Francis
// Elli

Let's have a quick look at the regular expression and try to phrase it in words, too.

const regex = /-\s(\w+)\s(?!\(vegan\))/g;

Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space character (which includes line breaks) when everything is not followed by the pattern "(vegan)"

Regex lookaheads will have company from lookbehinds soon

Lookbehinds will work the same way but for leading patterns. Lookaheads consider the patterns after the matching part whereas lookbehinds consider the patterns before. Lookbehinds are supported in Chrome today. They will also be available as positive lookbehind x(?<=y) and the negative lookbehind x(?<!y).

When we flip the example strings around and adjust the regular expression to use lookbehinds, everything still works.

const people = `
- (vegetarian) Bob
- (vegan) Billa
- Francis
- (vegetarian) Elli
- (vegan) Fred
`;

// use positive regex lookbehind
const regex = /(?<=\(vegan\))\s(\w+)/g;
//             |------------|  |---|  
//                  /             \__
//         positive lookbehind        \
//       => following "(vegan)"     more than one
//                                  word character
//                                  but as few as possible

let result = regex.exec(people);

while(result) {
  console.log(result[1]);
  result = regex.exec(people);
}

// Result:
// Billa
// Fred

Side note: I usually recommend RegExr for the fiddling with regular expressions, but lookbehinds are not supported yet.

Additional resources

If you're interested in more cutting edge features, have a look at Mathias' and Benedikt's slides on new features coming to JavaScript there is way more exciting stuff to come.

Another side note: If you're developing in the browser, make sure to check the support of lookbehinds first. At the time of writing, they're not supported in Firefox.

To remember the syntax for lookahead and lookbehinds I created a quick cheat sheet about it.

Related Topics

Related Articles