I was using a regex for extracting data from curved brackets (or "parentheses") like extracting a,b
from (a,b)
as shown below. I have a file in which every line will be like
this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a2,b2) and [b2|a2]
this is the range of values (a3,b3) and [b3|a3]
I'm using the following string to extract a1,b1
, a2,b2
, etc...
@numbers = $_ =~ /\((.*),(.*)\)/
However, if I want to extract the data from square brackets []
, how can I do it? For example
this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a1,b1) and [b2|a2]
I need to extract/match only the data in square brackets and not the curved brackets.
[Update] In the meantime, I've written a blog post about the specific issue with .*
I describe below: Why Using .* in Regular Expressions Is Almost Never What You Actually Want
If your identifiers a1
, b1
etc. never contain commas or square brackets themselves, you should use a pattern along the lines of the following to avoid backtracking hell:
/\[([^,\]]+),([^,\]]+)\]/
Here's a working example on Regex101.
The issue with greedy quantifiers like .*
is that you'll very likely consume too much in the beginning so that the regex engine has to do extensive backtracking. Even if you use non-greedy quantifiers, the engine will do more attempts to match than necessary because it'll only consume one character at a time and then try to advance the position in the pattern.
(You could even use atomic groups to make the matching even more performant.)
#!/usr/bin/perl
# your code goes here
my @numbers;
while(chomp(my $line=<DATA>)){
if($line =~ m|\[(.*),(.*)\]|){
push @numbers, ($1,$2);
}
}
print @numbers;
__DATA__
this is the range of values [a1,b1]
this is the range of values [a2,b2]
this is the range of values [a3,b3]
Demo
You can match it using non-greedy quantifier *?
my @numbers = $_ =~ /\[(.*?),(.*?)\]/g;
or
my @numbers = /\[(.*?),(.*?)\]/g;
for short.
UPDATE
my @numbers = /\[(.*?)\|(.*?)\]/g;
Use the below code
$_ =~ /\[(.*?)\|(.*?)\]/g;
Now if the pattern is successfully matched, the extracted values would be stored in $1
and $2
.