公告
财富商城
积分规则
提问
发文
2019-04-08 10:49发布
贪生不怕死
Have s = u'Gaga\xe2\x80\x99s' but need to convert to t = u'Gaga\u2019s'
s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
How can this be best achieved?
s = u'Gaga\xe2\x80\x99s' t = u'Gaga\u2019s' x = s.encode('raw-unicode-escape').decode('utf-8') assert x==t print(x)
yields
Gaga’s
Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:
>>> s = u'Gaga\xe2\x80\x99s' >>> s.encode('latin-1').decode('utf8') u'Gaga\u2019s'
import codecs s = u"Gaga\xe2\x80\x99s" s_as_str = codecs.charmap_encode(s)[0] t = unicode(s_as_str, "utf-8") print t
prints
u'Gaga\u2019s'
最多设置5个标签!
yields
Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:
prints