Regex for getting all digits in a string after a c

2020-04-08 01:36发布

问题:

I am trying to parse the following string and return all digits after the last square bracket:

C9: Title of object (foo, bar) [ch1, CH12,c03,4]

So the result should be:

1,12,03,4

The string and digits will change. The important thing is to get the digits after the '[' regardless of what character (if any) precede it. (I need this in python so no atomic groups either!) I have tried everything I can think of including:

 \[.*?(\d) = matches '1' only
 \[.*(\d) = matches '4' only
 \[*?(\d) = matches include '9' from the beginning

etc

Any help is greatly appreciated!

EDIT: I also need to do this without using str.split() too.

回答1:

You can rather find all digits in the substring after the last [ bracket:

>>> s = 'C9: Title of object (fo[ 123o, bar) [ch1, CH12,c03,4]'
>>> # Get substring after the last '['.
>>> target_string = s.rsplit('[', 1)[1]
>>>
>>> re.findall(r'\d+', target_string)
['1', '12', '03', '4']

If you can't use split, then this one would work with look-ahead assertion:

>>> s = 'C9: Title of object (fo[ 123o, bar) [ch1, CH12,c03,4]'
>>> re.findall(r'\d+(?=[^[]+$)', s)
['1', '12', '03', '4']

This finds all digits, which are followed by only non-[ characters till the end.



回答2:

It may help to use the non-greedy ?. For example:

\[.*?(\d*?),.*?(\d*?),.*?(\d*?),.*?(\d*?)\]

And, here's how it works (from https://regex101.com/r/jP7hM3/1):

"\[.*?(\d*?),.*?(\d*?),.*?(\d*?),.*?(\d*?)\]"
\[ matches the character [ literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
1st Capturing group (\d*?)
\d*? match a digit [0-9]
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
, matches the character , literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
2nd Capturing group (\d*?)
\d*? match a digit [0-9]
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
, matches the character , literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
3rd Capturing group (\d*?)
\d*? match a digit [0-9]
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
, matches the character , literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
4th Capturing group (\d*?)
\d*? match a digit [0-9]
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\] matches the character ] literally

Although - I have to agree with others... This is a regex solution, but its not a very pythonic solution.