Can anyone explain why example 1 below works, when the r
prefix is not used?
I thought the r
prefix must be used whenever escape sequences are used.
Example 2 and example 3 demonstrate this.
# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used
# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used
# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used
Not all sequences involving backslashes are escape sequences.
\t
and\f
are, for example, but\s
is not. In a non-raw string literal, any\
that is not part of an escape sequence is seen as just another\
:\b
is an escape sequence, however, so example 3 fails. (And yes, some people consider this behaviour rather unfortunate.)Because
\
begin escape sequences only when they are valid escape sequences.Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:
To better illustrate this last point:
the 'r' means the the following is a "raw string", ie. backslash characters are treated literally instead of signifying special treatment of the following character.
http://docs.python.org/reference/lexical_analysis.html#literals
so
'\n'
is a single newlineand
r'\n'
is two characters - a backslash and the letter 'n'another way to write it would be
'\\n'
because the first backslash escapes the secondan equivalent way of writing this
is
Because of the way Python treats characters that are not valid escape characters, not all of those double backslashes are necessary - eg
'\s'=='\\s'
however the same is not true for'\b'
and'\\b'
. My preference is to be explicit and double all the backslashes.