Python Convert Unicode-Hex utf-8 strings to Unicod

2019-04-08 10:49发布

Have s = u'Gaga\xe2\x80\x99s' but need to convert to t = u'Gaga\u2019s'

How can this be best achieved?

3条回答
Evening l夕情丶
2楼-- · 2019-04-08 10:51
s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
x = s.encode('raw-unicode-escape').decode('utf-8')
assert x==t

print(x)

yields

Gaga’s
查看更多
一纸荒年 Trace。
3楼-- · 2019-04-08 10:53

Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:

>>> s = u'Gaga\xe2\x80\x99s'
>>> s.encode('latin-1').decode('utf8')
u'Gaga\u2019s'
查看更多
Juvenile、少年°
4楼-- · 2019-04-08 11:17
import codecs

s = u"Gaga\xe2\x80\x99s"
s_as_str = codecs.charmap_encode(s)[0]
t = unicode(s_as_str, "utf-8")
print t

prints

u'Gaga\u2019s'
查看更多
登录 后发表回答