Trying to understand “Capturing groups” in regex w

2019-05-01 20:25发布

问题:

I am studying for the java OCP and at the moment I am stuck at understanding the "Capturing groups" section. It is a way too abstract as a description. Could you please (if you have time) give me some real examples using "Capturing groups"?

Is anybody able to provide me with a concrete example of the following statement?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g". The portion of the input string that matches the capturing group will be saved in memory for later recall via backreferences (as discussed below in the section, Backreferences).

I am pretty sure I'll get it as soon as I see a concrete example.

Thanks in advance.

回答1:

Among other things, regex lets you obtain portions of the input that were matched by various parts of the regular expression. Sometimes you need the entire match, but often you need only a part of it. For example, this regular expression matches "Page X of Y" strings:

Page \d+ of \d+

If you pass it a string

Page 14 of 203

you will match the entire string. Now let's say that you want only 14 and 203. No problem - regex library lets you enclose the two \d+ in parentheses, and then retrieve only the "14" and "203" strings from the match.

Page (\d+) of (\d+)

The above expression creates two capturing groups. The Matcher object obtained by matching the pattern lets you retrieve the content of these groups individually:

Pattern p = Pattern.compile("Page (\\d+) of (\\d+)");
String text = "Page 14 of 203";
Matcher m = p.matcher(text);
if (m.find()) {
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

This prints 14 and 203.

Demo on ideone.



回答2:

The capturing groups allow to query the Matcher to find out what the part of the string was that matched against a particular part of the regular expression, see this example:

String dateStr = "1981-06-25";

Pattern datePatt = Pattern.compile("([0-9]{4})/([0-9]{2})/([0-9]{2})");
...
Matcher m = datePatt.matcher(dateStr);
if (m.matches()) {
    int year  = Integer.parseInt(m.group(1));
    int month = Integer.parseInt(m.group(2));
    int day   = Integer.parseInt(m.group(3));
}

The variables year, month and day contains the value of groups 1, 2 and 3, respectively.



回答3:

It's for it you want to keep track of parts of the match. For example, if you have the regex

/^(http|ftp).*/

and you get a match, you can query the match for the group, and tell if it was http or ftp.



回答4:

For example take the regex

cat (dog )?bus

This will match both the strings cat dog bus and cat bus. That's because the entire dog part is optional because of the ?. If you did not wrap it in paren, then only the last space would be optional.

James while John (had )+a better effect on the teacher

will match the string

James while John had had had had had had had had had had had a better effect on the teacher

as it will match one or more of the entire had string.

You can also use alternation and back references with capture groups (something you haven't quite gotten to yet).

(cat|dog) is a \1

The \1 is a reference to whatever was captured in the first capture group. This will match dog is a dog and cat is a cat, but not dog is a cat or vice versa.



回答5:

Here you see some code examples you can easily understand.

Basically what you have within () you have remembered after the match. And you can see the string matching that group. Remember that if you do a second match, these values are replaced by the second match so if you need them, you need to save them immediately after match in some variabled defined by you.