I have a list of regular expressions (about 10 - 15) that I needed to match against some text. Matching them one by one in a loop is too slow. But instead of writing up my own state machine to match all the regexes at once, I am trying to |
the individual regexes and let perl do the work. The problem is that how do I know which of the alternatives matched?
This question addresses the case where there are no capturing groups inside each individual regex. (which portion is matched by regex?) What if there are capturing groups inside each regexes?
So with the following,
/^(A(\d+))|(B(\d+))|(C(\d+))$/
and the string "A123", how can I both know that A123 matched and extract "123"?
A123
will be in capture group$1
and123
will be in group$2
So you could say:
This is redundant, but you get the idea...
EDIT: No, you don't have to enumerate each sub match, you asked how to know whether
A123
matched and how to extract123
:if
block unlessA123
matched123
using the$2
backreference.So maybe this example would have been more clear:
EDIT 2:
To capture matches in an AoA (which is a different question, but this should do it):
Result:
Note that I modified your regex, but it looks like that's what you're going for judging by your comment...
You don't need to code up your own state machine to combine regexes. Look into Regexp:Assemble. It has methods that'll track which of your initial patterns matched.
Edit:
With your example data, it is easy to write
after which $1 will contain the prefix and $2 the suffix.
I cannot tell whether this is relevant to your real data, but to use an additional module seems like overkill.
Another thing you can do in Perl is to embed Perl code directly in your regex using "(?{...})". So, you can set a variable that tells you which part of the regex matched. WARNING: your regex should not contain any variables (outside of the embedded Perl code), that will be interpolated into the regex or you will get errors. Here is a sample parser that uses this feature:
which prints out the following:
It's kind of contrived, but you'll notice that the quotes (actually, apostrophes) around strings are stripped off (also, consecutive quotes are collapsed to single quotes), so in general, only the $kind variable will tell you whether the parser saw an identifier or a quoted string.
Why not use
/^ (?<prefix> A|B|C) (?<digits> \d+) $/x
. Note, named capture groups used for clarity, and not essential.