Can Anyone Explain ?: in regular expression [dupli

2020-07-10 09:10发布

问题:

TCL: Can Anyone Explain ?: in regular expression

I am getting confusion between ? and ?: .

? means preceding character may or may not be present.

Then I am not understanding what (?:) indicates.

Can Anyone please Explain this.

([0-9]+(?:\.[0-9]*)?)

回答1:

Suppose, you were trying to look for something like ABC123 or ABC123.45 in your input String and you wanted to capture the letters and the numbers separately. You would use a regex (a bit similar to yours) like

([A-Z]+)([0-9]+(\.[0-9]+)?)

The above regex would match ABC123.45 and provide three groups as well that represent sub-parts of the whole match and are decided by where you put those () brackets. So, given our regex (without using ?:) we got

Group 1 = ABC
Group 2 = 123.45
Group 3 = .45

Now, it may not make much sense to capture the decimal portion always and it actually has already been captured in our Group 2 as well. So, how would you make that group () non capturing? Yes, by using ?: at the start as

([A-Z]+)([0-9]+(?:\.[0-9]+)?)

Now, you only get the two desired groups

Group 1 = ABC
Group 2 = 123.45

Notice, I also changed the last part of the regex from \.[0-9]* to \.[0-9]+. This would prevent a match on 123. i.e. numbers without a decimal part but still having a dot.



回答2:

As mentioned in the re_syntax manual page from the Tcl documentation, the ?: within a parenthetical group turns off the capturing of that group. In other words the expression (\d)(\d) would match 2 digits and make each one available in a separate match group. The expression (\d)(?:\d) is similar but does not provide the matches in separate match groups. Specifically for tcl:

regexp {(\d)(\d)} $data -> first second

will make the first digit and second digits available in the named variables. The corresponding non-collecting regular expression will no provide 3 results but only 1 for the single match from the target. So your expression has 2 outputs one for everything matched and one for the outermost parentheses. The inner parentheses make a regexp group but avoid producing another matching output. So you have something that matches a decimal (3.1415, 0., 10)



回答3:

?: just doesn't create a capturing group. For example a(?:b) will match the "ab" in "abc"