Possible Duplicate:
How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?
How do convert unicode escape sequences to unicode characters in a python string
I have a string that contains unicode characters e.g. \u2026
etc. Somehow it is not received to me as unicode
, but is received as a str
. How do I convert it back to unicode?
>>> a="Hello\u2026"
>>> b=u"Hello\u2026"
>>> print a
Hello\u2026
>>> print b
Hello…
>>> print unicode(a)
Hello\u2026
>>>
So clearly unicode(a)
is not the answer. Then what is?
Unicode escapes only work in unicode strings, so this
is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.
To make unicode out of this, use
decode('unicode-escape')
:Decode it with the
unicode-escape
codec:This is because for a non-unicode string the
\u2026
is not recognised but is instead treated as a literal series of characters (to put it more clearly,'Hello\\u2026'
). You need to decode the escapes, and theunicode-escape
codec can do that for you.Note that you can get
unicode
to recognise it in the same way by specifying the codec argument:But the
a.decode()
way is nicer.