Is it worth using Python's re.compile?

2018-12-31 21:15发布

Is there any benefit in using compile for regular expressions in Python?

h = re.compile('hello')
h.match('hello world')

vs

re.match('hello', 'hello world')

标签: python regex
22条回答
骚的不知所云
2楼-- · 2018-12-31 22:05

In general, I find it is easier to use flags (at least easier to remember how), like re.I when compiling patterns than to use flags inline.

>>> foo_pat = re.compile('foo',re.I)
>>> foo_pat.findall('some string FoO bar')
['FoO']

vs

>>> re.findall('(?i)foo','some string FoO bar')
['FoO']
查看更多
孤独寂梦人
3楼-- · 2018-12-31 22:05

This answer might be arriving late but is an interesting find. Using compile can really save you time if you are planning on using the regex multiple times (this is also mentioned in the docs). Below you can see that using a compiled regex is the fastest when the match method is directly called on it. passing a compiled regex to re.match makes it even slower and passing re.match with the patter string is somewhere in the middle.

>>> ipr = r'\D+((([0-2][0-5]?[0-5]?)\.){3}([0-2][0-5]?[0-5]?))\D+'
>>> average(*timeit.repeat("re.match(ipr, 'abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
1.5077415757028423
>>> ipr = re.compile(ipr)
>>> average(*timeit.repeat("re.match(ipr, 'abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
1.8324008992184038
>>> average(*timeit.repeat("ipr.match('abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
0.9187896518778871
查看更多
浮光初槿花落
4楼-- · 2018-12-31 22:05

I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference

The votes on the accepted answer leads to the assumption that what @Triptych says is true for all cases. This is not necessarily true. One big difference is when you have to decide whether to accept a regex string or a compiled regex object as a parameter to a function:

>>> timeit.timeit(setup="""
... import re
... f=lambda x, y: x.match(y)       # accepts compiled regex as parameter
... h=re.compile('hello')
... """, stmt="f(h, 'hello world')")
0.32881879806518555
>>> timeit.timeit(setup="""
... import re
... f=lambda x, y: re.compile(x).match(y)   # compiles when called
... """, stmt="f('hello', 'hello world')")
0.809190034866333

It is always better to compile your regexs in case you need to reuse them.

Note the example in the timeit above simulates creation of a compiled regex object once at import time versus "on-the-fly" when required for a match.

查看更多
高级女魔头
5楼-- · 2018-12-31 22:05

My understanding is that those two examples are effectively equivalent. The only difference is that in the first, you can reuse the compiled regular expression elsewhere without causing it to be compiled again.

Here's a reference for you: http://diveintopython3.ep.io/refactoring.html

Calling the compiled pattern object's search function with the string 'M' accomplishes the same thing as calling re.search with both the regular expression and the string 'M'. Only much, much faster. (In fact, the re.search function simply compiles the regular expression and calls the resulting pattern object's search method for you.)

查看更多
长期被迫恋爱
6楼-- · 2018-12-31 22:06

Regular Expressions are compiled before being used when using the second version. If you are going to executing it many times it is definatly better to compile it first. If not compiling every time you match for one off's is fine.

查看更多
听够珍惜
7楼-- · 2018-12-31 22:07

I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference. Obviously, this is anecdotal, and certainly not a great argument against compiling, but I've found the difference to be negligible.

EDIT: After a quick glance at the actual Python 2.5 library code, I see that Python internally compiles AND CACHES regexes whenever you use them anyway (including calls to re.match()), so you're really only changing WHEN the regex gets compiled, and shouldn't be saving much time at all - only the time it takes to check the cache (a key lookup on an internal dict type).

From module re.py (comments are mine):

def match(pattern, string, flags=0):
    return _compile(pattern, flags).match(string)

def _compile(*key):

    # Does cache check at top of function
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None: return p

    # ...
    # Does actual compilation on cache miss
    # ...

    # Caches compiled regex
    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p

I still often pre-compile regular expressions, but only to bind them to a nice, reusable name, not for any expected performance gain.

查看更多
登录 后发表回答