Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
There is one addition perk of using re.compile(), in the form of adding comments to my regex patterns using re.VERBOSE
Although this does not affect the speed of running your code, I like to do it this way as it is part of my commenting habit. I throughly dislike spending time trying to remember the logic that went behind my code 2 months down the line when I want to make modifications.
Here's a simple test case:
with re.compile:
So, it would seem to compiling is faster with this simple case, even if you only match once.
I ran this test before stumbling upon the discussion here. However, having run it I thought I'd at least post my results.
I stole and bastardized the example in Jeff Friedl's "Mastering Regular Expressions". This is on a macbook running OSX 10.6 (2Ghz intel core 2 duo, 4GB ram). Python version is 2.6.1.
Run 1 - using re.compile
Run 2 - Not using re.compile
I just tried this myself. For the simple case of parsing a number out of a string and summing it, using a compiled regular expression object is about twice as fast as using the
re
methods.As others have pointed out, the
re
methods (includingre.compile
) look up the regular expression string in a cache of previously compiled expressions. Therefore, in the normal case, the extra cost of using there
methods is simply the cost of the cache lookup.However, examination of the code, shows the cache is limited to 100 expressions. This begs the question, how painful is it to overflow the cache? The code contains an internal interface to the regular expression compiler,
re.sre_compile.compile
. If we call it, we bypass the cache. It turns out to be about two orders of magnitude slower for a basic regular expression, such asr'\w+\s+([0-9_]+)\s+\w*'
.Here's my test:
The 'reallyCompiled' methods use the internal interface, which bypasses the cache. Note the one that compiles on each loop iteration is only iterated 10,000 times, not one million.
Mostly, there is little difference whether you use re.compile or not. Internally, all of the functions are implemented in terms of a compile step:
In addition, re.compile() bypasses the extra indirection and caching logic:
In addition to the small speed benefit from using re.compile, people also like the readability that comes from naming potentially complex pattern specifications and separating them from the business logic where there are applied:
Note, one other respondent incorrectly believed that pyc files stored compiled patterns directly; however, in reality they are rebuilt each time when the PYC is loaded:
The above disassembly comes from the PYC file for a
tmp.py
containing:I really respect all the above answers. From my opinion Yes! For sure it is worth to use re.compile instead of compiling the regex, again and again, every time.
Example :
Using in Findall
Using in search