I am studying for the java OCP and at the moment I am stuck at understanding the "Capturing groups" section. It is a way too abstract as a description. Could you please (if you have time) give me some real examples using "Capturing groups"?
Is anybody able to provide me with a concrete example of the following statement?
Capturing groups are a way to treat multiple characters as a single
unit. They are created by placing the characters to be grouped inside
a set of parentheses. For example, the regular expression (dog)
creates a single group containing the letters "d" "o" and "g". The
portion of the input string that matches the capturing group will be
saved in memory for later recall via backreferences (as discussed
below in the section, Backreferences).
I am pretty sure I'll get it as soon as I see a concrete example.
Thanks in advance.
Among other things, regex lets you obtain portions of the input that were matched by various parts of the regular expression. Sometimes you need the entire match, but often you need only a part of it. For example, this regular expression matches "Page X of Y"
strings:
Page \d+ of \d+
If you pass it a string
Page 14 of 203
you will match the entire string. Now let's say that you want only 14
and 203
. No problem - regex library lets you enclose the two \d+
in parentheses, and then retrieve only the "14"
and "203"
strings from the match.
Page (\d+) of (\d+)
The above expression creates two capturing groups. The Matcher
object obtained by matching the pattern lets you retrieve the content of these groups individually:
Pattern p = Pattern.compile("Page (\\d+) of (\\d+)");
String text = "Page 14 of 203";
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
This prints 14
and 203
.
Demo on ideone.
The capturing groups allow to query the Matcher to find out what the part of the string was that matched against a particular part of the regular expression, see this example:
String dateStr = "1981-06-25";
Pattern datePatt = Pattern.compile("([0-9]{4})/([0-9]{2})/([0-9]{2})");
...
Matcher m = datePatt.matcher(dateStr);
if (m.matches()) {
int year = Integer.parseInt(m.group(1));
int month = Integer.parseInt(m.group(2));
int day = Integer.parseInt(m.group(3));
}
The variables year, month and day contains the value of groups 1, 2 and 3, respectively.
It's for it you want to keep track of parts of the match. For example, if you have the regex
/^(http|ftp).*/
and you get a match, you can query the match for the group, and tell if it was http or ftp.
For example take the regex
cat (dog )?bus
This will match both the strings cat dog bus
and cat bus
. That's because the entire dog
part is optional because of the ?
. If you did not wrap it in paren, then only the last space would be optional.
James while John (had )+a better effect on the teacher
will match the string
James while John had had had had had had had had had had had a better effect on the teacher
as it will match one or more of the entire had
string.
You can also use alternation and back references with capture groups (something you haven't quite gotten to yet).
(cat|dog) is a \1
The \1
is a reference to whatever was captured in the first capture group. This will match dog is a dog
and cat is a cat
, but not dog is a cat
or vice versa.
Here you see some code examples you can easily understand.
Basically what you have within ()
you have remembered after the match. And you can see the string matching that group. Remember that if you do a second match, these values are replaced by the second match so if you need them, you need to save them immediately after match in some variabled defined by you.