Groups, as the name suggests, are meant to be used to “group” components of regular expressions. These groups can be used to:
- Extract subsets of matches
- Repeat groups an arbitrary number of times
- Refer to previously matched substrings
- Enhance readability
- Allow complex alternations
We’ll see how to do a lot of this in later chapters, but learning how groups work will allow us to study some great examples in these later chapters.
Capturing groups are denoted by
). Here’s an expository example:
Capturing groups allow extracting parts of matches.
Using your language’s regex functions, you would be able to extract the text between the matched braces for each of these strings.
Capturing groups can also be used to group regex parts for ease of repetition of said group. While we will cover repetition in detail in chapters that follow, here’s an example that demonstrates the utility of groups.
Other times, they are used to group logically similar parts of the regex for readability.
Backreferences allow referring to previously captured substrings.
The match from the first group would be
\1, that from the second would be
\2, and so on…
Backreferences cannot be used to reduce duplication in regexes. They refer to the match of groups, not the pattern.
Here’s an example that demonstrates a common use-case:
This cannot be achieved with a repeated character classes.
Non-capturing groups are very similar to capturing groups, except that they don’t create “captures”. They take the form
Non-capturing groups are usually used in conjunction with capturing groups. Perhaps you are attempting to extract some parts of the matches using capturing groups. You may wish to use a group without messing up the order of the captures. This is where non-capturing groups come handy.
Query String Parameters
We match the first key-value pair separately because that allows us to use
&, the separator, as part of the repeating group.
(Basic) HTML tags
However, it’s a relevant example:
John Doe Jane Doe Sven Svensson Janez Novak Janez Kranjski Tim Joe
Doe, John Doe, Jane Svensson, Sven Novak, Janez Kranjski, Janez Joe, Tim
Backreferences and plurals
This is a paragraph with some words. Some instances of the word "word" are in their plural form: "words". Yet, some are in their singular form: "word".
This is a paragraph with some phrases. Some instances of the phrase "phrase" are in their plural form: "phrases". Yet, some are in their singular form: "phrase".
- In replacement contexts,
$2, … are usually used in place of
\2, … to refer to captured strings.↩