I was using a regex for extracting data from curved brackets (or "parentheses") like extracting a,b
from (a,b)
as shown below. I have a file in which every line will be like
this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a2,b2) and [b2|a2]
this is the range of values (a3,b3) and [b3|a3]
I'm using the following string to extract a1,b1
, a2,b2
, etc...
@numbers = $_ =~ /\((.*),(.*)\)/
However, if I want to extract the data from square brackets []
, how can I do it? For example
this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a1,b1) and [b2|a2]
I need to extract/match only the data in square brackets and not the curved brackets.
Demo
Use the below code
Now if the pattern is successfully matched, the extracted values would be stored in
$1
and$2
.You can match it using non-greedy quantifier
*?
or
for short.
UPDATE
[Update] In the meantime, I've written a blog post about the specific issue with
.*
I describe below: Why Using .* in Regular Expressions Is Almost Never What You Actually WantIf your identifiers
a1
,b1
etc. never contain commas or square brackets themselves, you should use a pattern along the lines of the following to avoid backtracking hell:Here's a working example on Regex101.
The issue with greedy quantifiers like
.*
is that you'll very likely consume too much in the beginning so that the regex engine has to do extensive backtracking. Even if you use non-greedy quantifiers, the engine will do more attempts to match than necessary because it'll only consume one character at a time and then try to advance the position in the pattern.(You could even use atomic groups to make the matching even more performant.)