Python re.sub() beginning-of-line anchoring

2019-06-25 14:49发布

问题:

Consider the following multiline string:

>> print s
shall i compare thee to a summer's day?
thou art more lovely and more temperate
rough winds do shake the darling buds of may,
and summer's lease hath all too short a date.

re.sub() replaces all the occurrence of and with AND:

>>> print re.sub("and", "AND", s)
shall i compare thee to a summer's day?
thou art more lovely AND more temperate
rough winds do shake the darling buds of may,
AND summer's lease hath all too short a date.

But re.sub() doesn't allow ^ anchoring to the beginning of the line, so adding it causes no occurrence of and to be replaced:

>>> print re.sub("^and", "AND", s)
shall i compare thee to a summer's day?
thou art more lovely and more temperate
rough winds do shake the darling buds of may,
and summer's lease hath all too short a date.

How can I use re.sub() with start-of-line (^) or end-of-line ($) anchors?

回答1:

You forgot to enable multiline mode.

re.sub("^and", "AND", s, flags=re.M)

re.M
re.MULTILINE

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

source

The flags argument isn't available for python older than 2.7; so in those cases you can set it directly in the regular expression like so:

re.sub("(?m)^and", "AND", s)


回答2:

Add (?m) for multiline:

print re.sub(r'(?m)^and', 'AND', s)

See the re documentation here.