According to the docs, the builtin string encoding string_escape
:
Produce[s] a string that is suitable as string literal in Python source code
...while the unicode_escape
:
Produce[s] a string that is suitable as Unicode literal in Python source code
So, they should have roughly the same behaviour. BUT, they appear to treat single quotes differently:
>>> print """before '" \0 after""".encode('string-escape')
before \'" \x00 after
>>> print """before '" \0 after""".encode('unicode-escape')
before '" \x00 after
The string_escape
escapes the single quote while the Unicode one does not. Is it safe to assume that I can simply:
>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")
...and get the expected behaviour?
Edit: Just to be super clear, the expected behavior is getting something suitable as a literal.
Within the range 0 ≤ c < 128, yes the
'
is the only difference for CPython 2.6.Outside of this range the two types are not exchangeable.
On Python 3.x, the
string_escape
encoding no longer exists, sincestr
can only store Unicode.According to my interpretation of the implementation of
unicode-escape
and the unicoderepr
in the CPython 2.6.5 source, yes; the only difference betweenrepr(unicode_string)
andunicode_string.encode('unicode-escape')
is the inclusion of wrapping quotes and escaping whichever quote was used.They are both driven by the same function,
unicodeescape_string
. This function takes a parameter whose sole function is to toggle the addition of the wrapping quotes and escaping of that quote.