Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
In general, I find it is easier to use flags (at least easier to remember how), like
re.I
when compiling patterns than to use flags inline.vs
This answer might be arriving late but is an interesting find. Using compile can really save you time if you are planning on using the regex multiple times (this is also mentioned in the docs). Below you can see that using a compiled regex is the fastest when the match method is directly called on it. passing a compiled regex to re.match makes it even slower and passing re.match with the patter string is somewhere in the middle.
The votes on the accepted answer leads to the assumption that what @Triptych says is true for all cases. This is not necessarily true. One big difference is when you have to decide whether to accept a regex string or a compiled regex object as a parameter to a function:
It is always better to compile your regexs in case you need to reuse them.
Note the example in the timeit above simulates creation of a compiled regex object once at import time versus "on-the-fly" when required for a match.
My understanding is that those two examples are effectively equivalent. The only difference is that in the first, you can reuse the compiled regular expression elsewhere without causing it to be compiled again.
Here's a reference for you: http://diveintopython3.ep.io/refactoring.html
Regular Expressions are compiled before being used when using the second version. If you are going to executing it many times it is definatly better to compile it first. If not compiling every time you match for one off's is fine.
I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference. Obviously, this is anecdotal, and certainly not a great argument against compiling, but I've found the difference to be negligible.
EDIT: After a quick glance at the actual Python 2.5 library code, I see that Python internally compiles AND CACHES regexes whenever you use them anyway (including calls to
re.match()
), so you're really only changing WHEN the regex gets compiled, and shouldn't be saving much time at all - only the time it takes to check the cache (a key lookup on an internaldict
type).From module re.py (comments are mine):
I still often pre-compile regular expressions, but only to bind them to a nice, reusable name, not for any expected performance gain.