Can't get Python regex backreferences to work

2019-02-26 05:05发布

I want to match the docstrings of a Python file. Eg.

r""" Hello this is Foo
     """

Using only """ should be enough for the start.

>>> data = 'r""" Hello this is Foo\n     """'
>>> def display(m):
...     if not m:
...             return None
...     else:
...             return '<Match: %r, groups=%r>' % (m.group(), m.groups())
...
>>> import re
>>> print display(re.match('r?"""(.*?)"""', data, re.S))
<Match: 'r""" Hello this is Foo\n     """', groups=(' Hello this is Foo\n     ',)>
>>> print display(re.match('r?(""")(.*?)\1', data, re.S))
None

Can someone please explain to me why the first expression matches and the other does not?

2条回答
一纸荒年 Trace。
2楼-- · 2019-02-26 05:15

I think you might be missing the re.DOTALL or re.MULTILINE flags. In this case a re.DOTALL should allow your regex .*? to match newlines as well

查看更多
手持菜刀,她持情操
3楼-- · 2019-02-26 05:38

You are using the escape sequence \1 instead of the backreference \1.

You can fix this by changing to escaping the \ before 1.

print display(re.match('r?(""")(.*?)\\1', data, re.S))

You can also fix it by using a raw string for your regex, with no escape sequences.

print display(re.match(r'r?(""")(.*?)\1', data, re.S))
查看更多
登录 后发表回答