Backreferences in JavaScript regular expressions
- Published at
- Updated at
- Reading time
- 2min
Today I was preparing a slide deck about new features in JavaScript regular expressions and came across the article "Named capture groups" written by Axel Rauschmayer . The section about backreferences caught my eye.
In some situations you might want to create a regular expression that includes repeated character sequences like the following one:
/(abc)(abc)(abc)/
. Instead of copying the character groups several times, you want to reuse the pattern. Is this possible in regular expressions? You bet!
When you define your regular expressions, you can reuse and backreference previous groups via
\1
,
\2
, etc..
/(🍕)(🌯)\1\2/.exec('🍕🌯🍕🌯');
// (3) ["🍕🌯🍕🌯", "🍕", "🌯", index: 0, input: "🍕🌯🍕🌯", ... ]
// Match:
// - a pizza
// - a burrito
// - a pizza (backreferenced)
// - a burrito (backreferenced)
/(🍕)(🌯)\1\2/.exec('🍕🌯🍕');
// null (because one burrito is missing)
You can do the same for named capture groups via
\k<name>
.
/(?<one>🍕)(?<two>🌯)\k<one>\k<two>/.exec('🍕🌯🍕🌯');
// (3) ["🍕🌯🍕🌯", "🍕", "🌯", index: 0, input: "🍕🌯🍕🌯", groups: {…}]
// Match:
// - a pizza
// - a burrito
// - a pizza (backreferenced via the named capture group 'one')
// - a burrito (backreferenced via the named capture group 'two')
/(?<one>🍕)(?<two>🌯)\k<one>\k<two>/.exec('🍕🌯🍕');
// null (because one burrito is missing)
Arnd Issler pointed out
, that you can not talk about back references in regular expression without mentioning the references when using
String
.
So, here we go. 😊
Replacement references for capture groups
Turns out, if you use
replace
with a regular expression, you can reference included capture groups using
$1
,
$2
, etc..
MDN provides a good example to swap words using references.
const re = /(\w+)\s(\w+)/;
const str = 'Jane Smith';
const newstr = str.replace(re, '$2, $1');
console.log(newstr); // Smith, Jane
To follow the earlier examples you can have a look at the following "pizza-burrito-snippet":
'🍕🌯🍕'.replace(
/(🍕)(🌯)\1/,
'first group: $1, second group: $2'
// "first group: 🍕, second group: 🌯"
Note that when using
$1
and
$2
in the
replace
function these match with the first and second regular expression capture group.
$1
references the
🍕
, and
$2
references the
🌯
. Wild stuff!
But as sequences such as
$1
and
$2
reference capture groups you might wonder how you replace something with
$1
without referencing an included capture group. In that case, you can use e.g.
$$1
.
'🍕🌯🍕'.replace(
/(🍕)(🌯)\1/,
'$$1 $$2'
// "$1 $2"
Replacement references for named capture groups
The same reference functionality works for named capture groups using
$<name>
:
'🍕🌯🍕'.replace(
/(?<one>🍕)(?<two>🌯)\k<one>/,
'first group: $<one>, second group: $<two>'
// "first group: 🍕, second group: 🌯"
And similarly, if you want to replace something with
$<name>
if there is a named capture group present you can use
$$<name>
;
'🍕🌯🍕🌯🍕🌯'.replace(