I have a regular expression:
(<select([^>]*>))(.*?)(</select\s*>)
Since it uses lazy repeat quantifier, for longer strings(having options more than 500) it backtracks for more than 100,000 times and fails. Please help me to find a better regular expression which doesn't use lazy repeat quantifier
...or in human-readable form:
This is an example of the "unrolled loop" technique Friedl develops in his book, Mastering Regular Expressions. I did a quick test in RegexBuddy using a pattern based on reluctant quantifiers:
...and it took about 6,000 steps to find a match. The unrolled-loop pattern took only 500 steps. And when I removed the closing bracket from the end tag (
</select
), making a match impossible, it required only 800 steps to report failure.If your regex flavor supports possessive quantifiers, go ahead and use them, too:
It takes about the same number of steps to achieve a match, but it can use a lot less memory in the process. And if no match is possible, it fails even more quickly; in my tests it took about 500 steps, the same number it took to find a match.
Unfortunately this wont work, see the answer by Alan Moore for a correct example!
From the perl regexp manpage: