It seems that using a character class is faster than the alternation in an example like:
[abc]
vs (a|b|c)
I have heard about it being recommended and with a simple test using Time::HiRes
I verified it (~10 times slower).
Also using (?:a|b|c)
in case the capturing parenthesis makes a difference does not change the result.
But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking hits in affecting the alternation. Is it a result of the implementation's nature of alternation?
相关问题
- Faster loop: foreach vs some (performance of jsper
- $ENV{$variable} in perl
- Why wrapping a function into a lambda potentially
- Ado.net performance:What does SNIReadSync do?
- Device support warning : Google play 2019
相关文章
- Optimization techniques for backtracking regex imp
- Regex to check for new line
- Allow only 2 decimal points entry to a textbox usi
- Running a perl script on windows without extension
- DOM penalty of using html attributes
- Which is faster, pointer access or reference acces
- Comparing speed of non-matching regexp
- Django is sooo slow? errno 32 broken pipe? dcramer
Because a character class like
[abc]
is irreducable and can be optimised, whereas an alternation like(?:a|b|c)
may also be(?:aa(?!xx)|[^xba]*?|t(?=.[^t])t)
.The authors have chosen not to optimise the regex compiler to check that all elements of an alternation are a single character.
There is a big difference between "check that the next character is in this character class" and "check that the rest of the string matches any one of these regular expressions".
This is because the "OR" construct
|
backtracks between the alternation: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:But to be short, the fact that pcre engine optimizes this (single literal characters -> character class) away is already a decent hint that alternations are inefficient.