I have dictionary of dictionary which has utf8 encoded keys. I am dumping this dictionary to a file using json
module.
In the file keys are printed in utf8 format. Keys are actually letters of Bengali language.
I want actual letters to get written in the file. How to do this ??
If I print those keys(one of them is u'\u0982') to console actual letter(ং) is shown but in my file \u0982
.is written. What does print do to show the actual letter?
You are writing JSON; the JSON standard allows for \uxxxx
escape sequences to encode non-ASCII characters. The Python json
module uses this by default.
Switch off the feature by using the ensure_ascii=False
switch when dumping the data:
json.dump(obj, yourfileobject, ensure_ascii=False)
This does mean that the output is no longer encoded to UTF-8 bytes as well; you'll need to use a codecs.open()
managed file for this:
import json
import codecs
with codecs.open('/path/to/file', 'w', encoding='utf8') as output:
json.dump(obj, output, ensure_ascii=False)
Now your unicode characters will be written to the file as UTF-8 encoded bytes instead. When opening the file with another program that decodes UTF-8 again, your codepoints should be displayed again as the same characters.
use ensure_ascii
parameter.
>>> import json
>>> print json.dumps(u'\u0982')
"\u0982"
>>> print json.dumps(u'\u0982', ensure_ascii=False)
"ং"
http://docs.python.org/2/library/json.html#json.dump
If ensure_ascii is True (the default), all non-ASCII characters in the
output are escaped with \uXXXX sequences, and the result is a str
instance consisting of ASCII characters only. If ensure_ascii is
False, some chunks written to fp may be unicode instances. ...