I have an UTF-8 character encoded with `_' in between, e.g., '_ea_b4_80'. I'm trying to convert it into UTF-8 character using replace method, but I can't get the correct encoding.
This is a code example:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
r = '_ea_b4_80'
r2 = '\xea\xb4\x80'
r = r.replace('_', '\\x')
print r
print r.encode("utf-8")
print r2
In this example, r is not the same as r2; this is an output.
\xea\xb4\x80
\xea\xb4\x80
관 <-- correctly shown
What might be wrong?
\x
is only meaningful in string literals, you're can't usereplace
to add it.To get your desired result, convert to bytes, then decode:
which should get you
관
as you desire.If you're using modern Py3, you can avoid the import (assuming
r
is in fact astr
;bytes.fromhex
, unlikebinascii.hexlify
, only takestr
inputs, notbytes
inputs) using thebytes.fromhex
class method in place ofbinascii.unhexlify
: