This question already has answers here:
Closed 6 years ago.
TCL: Can Anyone Explain ?: in regular expression
I am getting confusion between ? and ?: .
? means preceding character may or may not be present.
Then I am not understanding what (?:) indicates.
Can Anyone please Explain this.
([0-9]+(?:\.[0-9]*)?)
Suppose, you were trying to look for something like ABC123
or ABC123.45
in your input String and you wanted to capture the letters and the numbers separately. You would use a regex (a bit similar to yours) like
([A-Z]+)([0-9]+(\.[0-9]+)?)
The above regex would match ABC123.45
and provide three groups as well that represent sub-parts of the whole match and are decided by where you put those ()
brackets. So, given our regex (without using ?:
) we got
Group 1 = ABC
Group 2 = 123.45
Group 3 = .45
Now, it may not make much sense to capture the decimal portion always and it actually has already been captured in our Group 2 as well. So, how would you make that group ()
non capturing? Yes, by using ?:
at the start as
([A-Z]+)([0-9]+(?:\.[0-9]+)?)
Now, you only get the two desired groups
Group 1 = ABC
Group 2 = 123.45
Notice, I also changed the last part of the regex from \.[0-9]*
to \.[0-9]+
. This would prevent a match on 123.
i.e. numbers without a decimal part but still having a dot.
As mentioned in the re_syntax manual page from the Tcl documentation, the ?:
within a parenthetical group turns off the capturing of that group. In other words the expression (\d)(\d)
would match 2 digits and make each one available in a separate match group. The expression (\d)(?:\d)
is similar but does not provide the matches in separate match groups. Specifically for tcl:
regexp {(\d)(\d)} $data -> first second
will make the first digit and second digits available in the named variables. The corresponding non-collecting regular expression will no provide 3 results but only 1 for the single match from the target. So your expression has 2 outputs one for everything matched and one for the outermost parentheses. The inner parentheses make a regexp group but avoid producing another matching output. So you have something that matches a decimal (3.1415, 0., 10)
?:
just doesn't create a capturing group. For example a(?:b)
will match the "ab" in "abc"