import re
sstring = "ON Any ON Any"
regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)
regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
for a in regex1.findall(sstring): print(a)
print("----------")
for a in regex2.findall(sstring): print(a)
print("----------")
for a in regex3.findall(sstring): print(a)
print("----------")
('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', 'Any')
('', '')
ON
Any
ON
Any
Having read many articles on the internet and S.O. I think I still don't understand the regex word boundary: \b
The first regex doesn't give me the expected result I think it's must give me "ON Any On Any" but it still not give me that.
The second regex gives me tuples and I don't know why or understand the meaning of: ('', '')
The third regex gives prints the results on separated lines and empty lines in betweens
Could you please help me to understand that.
Note that to match
ON ANY
you need to add an escaped (since you are usingre.VERBOSE
flag) space betweenON
andANY
as\b
word boundary being a zero-width assertion does not consume any text, just asserts a position between specific characters. That is the reason for your firstre.compile(r''' \bON\bANY\b''', re.VERBOSE)
approach failure.Use
See the Python demo
The
re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
returns tuples since you defined(...)
capturing groups in the pattern.The
re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
matches optional sequences, eitherON
orAny
, so you get those words as values. You get empty values as well because this regex can match just a word boundary (all other subpatterns are optional).More details about word boundaries: