Regex speed in Perl 6

2020-02-14 03:49发布

问题:

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly. I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

my @search = <abcde cdeff fabcd>;

my token search {
    @search
}

my @new_array = @array.grep({/ <search> /});
say @new_array;

Then I printed @array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.

$ grep -f search array

After both programs produced the same result, as expected, I measured the time they were working.

$ time perl6 search.p6
real    0m6,683s
user    0m6,724s
sys     0m0,044s
$ time grep -f search array
real    0m0,009s
user    0m0,008s
sys     0m0,000s

So, what am I doing wrong in my Perl 6 code?

UPD: If I pass the search tokens to grep, looping through the @search array, the program works much faster:

my @array = "aaaaa" .. "fffff";
say +@array;

my @search = <abcde cdeff fabcd>;

for @search -> $token {
  say ~@array.grep({/$token/});
}
$ time perl6 search.p6
real    0m1,378s
user    0m1,400s
sys     0m0,052s

And if I define each search pattern manually, it works even faster:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

say ~@array.grep({/abcde/});
say ~@array.grep({/cdeff/});
say ~@array.grep({/fabcd/});
$ time perl6 search.p6
real    0m0,587s
user    0m0,632s
sys     0m0,036s

回答1:

The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.


For a more performant example, you could pre-compile the regex:

my $search = "/@search.join('|')/".EVAL;
#  $search =  /abcde|cdeff|fabcd/;
say ~@array.grep($search);

That change causes it to run in about half a second.

If there is any chance of malicious data in @search, and you have to do this it may be safer to use:

"/@search».Str».perl.join('|')/".EVAL

The compiler can't quite generate that optimized code for /@search/ as @search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search doesn't get modified.
(I think Perl 5 does something similar)

One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.



标签: raku