Reading JSON: what encoding is “\u00c5\u0082”? How

2019-05-26 17:35发布

One of the values in a JSON file I'm parsing is Wroc\u00c5\u0082aw. How can I turn this string into a unicode object that yields "Wrocław" (which is the correct decoding in this case)?

2条回答
在下西门庆
2楼-- · 2019-05-26 17:57

It looks your JSON hasn't the right encoding because neither \u00c5 nor \u0082aw yields the characters you're expecting in any encoding.

But you'd maybe try to encode this value in UTF8 or UTF16

查看更多
爷的心禁止访问
3楼-- · 2019-05-26 17:57

It looks like whatever process generated that JSON took UTF-8-encoded text and mistook it for Latin-1-encoded text. To fix the error, run the same process in reverse:

>>> u'Wroc\u00c5\u0082aw'.encode('iso-8859-1').decode('utf-8')
u'Wroc\u0142aw'
>>> import unicodedata
>>> unicodedata.name(u'\u0142')
'LATIN SMALL LETTER L WITH STROKE'
查看更多
登录 后发表回答