Make utf8 readable in a file

2020-04-10 02:34发布

问题:

I have dictionary of dictionary which has utf8 encoded keys. I am dumping this dictionary to a file using json module.
In the file keys are printed in utf8 format. Keys are actually letters of Bengali language.

I want actual letters to get written in the file. How to do this ??

If I print those keys(one of them is u'\u0982') to console actual letter(ং) is shown but in my file \u0982.is written. What does print do to show the actual letter?

回答1:

You are writing JSON; the JSON standard allows for \uxxxx escape sequences to encode non-ASCII characters. The Python json module uses this by default.

Switch off the feature by using the ensure_ascii=False switch when dumping the data:

json.dump(obj, yourfileobject, ensure_ascii=False)

This does mean that the output is no longer encoded to UTF-8 bytes as well; you'll need to use a codecs.open() managed file for this:

import json
import codecs

with codecs.open('/path/to/file', 'w', encoding='utf8') as output:
    json.dump(obj, output, ensure_ascii=False)

Now your unicode characters will be written to the file as UTF-8 encoded bytes instead. When opening the file with another program that decodes UTF-8 again, your codepoints should be displayed again as the same characters.



回答2:

use ensure_ascii parameter.

>>> import json
>>> print json.dumps(u'\u0982')
"\u0982"
>>> print json.dumps(u'\u0982', ensure_ascii=False)
"ং"

http://docs.python.org/2/library/json.html#json.dump

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. ...