I can't figure out how to match a string but not if it has a trailing newline character (\n
), which seems automatically stripped:
import re
print(re.match(r'^foobar$', 'foobar'))
# <_sre.SRE_Match object; span=(0, 6), match='foobar'>
print(re.match(r'^foobar$', 'foobar\n'))
# <_sre.SRE_Match object; span=(0, 6), match='foobar'>
print(re.match(r'^foobar$', 'foobar\n\n'))
# None
For me, the second case should also return None
.
When we set the end of a pattern with $
, like ^foobar$
, it should only match a string like foobar
, not foobar\n
.
What am I missing?
The documentation says this about the
$
character:So, without the
MULTILINE
option, it matches exactly the first two strings you tried:'foobar'
and'foobar\n'
, but not'foobar\n\n'
, because that is not a newline at the end of the string.On the other hand, if you choose
MULTILINE
option, it will match the end of any line:Of course, this will also match in the following case, which may or may not be what you want:
In order to NOT match the ending newline, use the negative lookahead as DeepSpace wrote.
You more likely don't need
$
but rather\Z
:\Z
matches only at the end of the string.This is the defined behavior of
$
, as can be read in the docs that @zvone linked to or even on https://regex101.com:You can use an explicit negative lookahead to counter this behavior: