Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
Is there any benefit in using compile for regular expressions in Python?
h = re.compile('hello')
h.match('hello world')
vs
re.match('hello', 'hello world')
(months later) it's easy to add your own cache around re.match, or anything else for that matter --
A wibni, wouldn't it be nice if: cachehint( size= ), cacheinfo() -> size, hits, nclear ...
Besides the performance.
Using
compile
helps me to distinguish the concepts of1. module(re),
2. regex object
3. match object
When I started learning regex
As a complement, I made an exhaustive cheatsheet of module
re
for your reference.i'd like to motivate that pre-compiling is both conceptually and 'literately' (as in 'literate programming') advantageous. have a look at this code snippet:
in your application, you'd write:
this is about as simple in terms of functionality as it can get. because this is example is so short, i conflated the way to get
_text_has_foobar_re_search
all in one line. the disadvantage of this code is that it occupies a little memory for whatever the lifetime of theTYPO
library object is; the advantage is that when doing a foobar search, you'll get away with two function calls and two class dictionary lookups. how many regexes are cached byre
and the overhead of that cache are irrelevant here.compare this with the more usual style, below:
In the application:
I readily admit that my style is highly unusual for python, maybe even debatable. however, in the example that more closely matches how python is mostly used, in order to do a single match, we must instantiate an object, do three instance dictionary lookups, and perform three function calls; additionally, we might get into
re
caching troubles when using more than 100 regexes. also, the regular expression gets hidden inside the method body, which most of the time is not such a good idea.be it said that every subset of measures---targeted, aliased import statements; aliased methods where applicable; reduction of function calls and object dictionary lookups---can help reduce computational and conceptual complexity.
For me, the biggest benefit to
re.compile
isn't any kind of premature optimization (which is the root of all evil, anyway). It's being able to separate definition of the regex from its use.Even a simple expression such as
0|[1-9][0-9]*
(integer in base 10 without leading zeros) can be complex enough that you'd rather not have to retype it, check if you made any typos, and later have to recheck if there are typos when you start debugging. Plus, it's nicer to use a variable name such as num or num_b10 than0|[1-9][0-9]*
.It's certainly possible to store strings and pass them to re.match; however, that's less readable:
Versus compiling:
Though it is fairly close, the last line of the second feels more natural and simpler when used repeatedly.