如何匹配Python原始字符串的新行字符(How to match a new line chara

我得到了Python原始字符串有点困惑。我知道，如果我们使用原始字符串，那么它将把“\”作为一个正常的反斜杠（例如，R“\ n”是“\”和“n”）。不过，我想知道如果我想匹配原始字符串换行符。我试图R“\ n”，但没有奏效。任何人有这个有些好主意吗？

Answer 1:

在正则表达式，你需要指定你在多行模式是：

>>> import re
>>> s = """cat
... dog"""
>>> 
>>> re.match(r'cat\ndog',s,re.M)
<_sre.SRE_Match object at 0xcb7c8>

请注意， re转换的\n （原始字符串）转换成换行符。正如你在你的评论所指出的，你实际上并不需要 re.M它来搭配，但它确实有助于与匹配$和^更直观：

>> re.match(r'^cat\ndog',s).group(0)
'cat\ndog'
>>> re.match(r'^cat$\ndog',s).group(0)  #doesn't match
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match(r'^cat$\ndog',s,re.M).group(0) #matches.
'cat\ndog'

Answer 2:

最简单的答案是简单地不使用原始字符串。您可以通过使用逃脱反斜杠\\ 。

如果你有反斜杠的庞大的数字在某些领域，那么你可以串联原始字符串与普通字符串需要：

r"some string \ with \ backslashes" "\n"

（Python的自动串接，它们之间只有空白字符串文字。）

记住，如果你是在Windows上的路径工作，最简单的选择是仅使用正斜杠 - 它仍然会正常工作。

Answer 3:

def clean_with_puncutation(text):    
    from string import punctuation
    import re
    punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation}
    punctuation_token['<br/>']="<TOKEN_BL>"
    punctuation_token['\n']="<TOKEN_NL>"
    punctuation_token['<EOF>']='<TOKEN_EOF>'
    punctuation_token['<SOF>']='<TOKEN_SOF>'
  #punctuation_token



    regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\(\)\[\]\
           {\}\;\:\,\.\/\?\|\`\_\\+\\\=\~\-\<\>]"

###Always put new sequence token at front to avoid overlapping results
 #text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ '
    text_=""

    matches = re.finditer(regex, text)

    index=0

    for match in matches:
     #print(match.group())
     #print(punctuation_token[match.group()])
     #print ("Match at index: %s, %s" % (match.start(), match.end()))
        text_=text_+ text[index:match.start()] +" " 
              +punctuation_token[match.group()]+ " "
        index=match.end()
    return text_