I have a python application that encodes some objects to json, passes the json string to another program, and then reads in a possibly modified version of that json string.
I need to check that what's changed with the json encoded objects. However, I'm having trouble with re-encoding non-ascii characters. For example:
x = {'\xe2': None} # a dict with non-ascii keys
y = json.dumps(x,ensure_ascii=False)
y
#> '{"\xe2": null}'
works just fine, but when I try to load the json, I get:
json.loads(y)
#> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 0
json.loads(y.decode('utf-8','ignore'))
#> "{u'': None}"
json.loads(y.decode('utf-8','replace'))
#> {u'\ufffd': None}
and unfortunately '\xe2' in {u'\ufffd': None}
evaluates to False
I'm willing to bet there is a simple solution, but all my googling and searching on SO has failed to find an adequate solution.
The easiest way to fix this is to go to the thing that is generating this
dict
and properly encode things there as utf-8. Currently, your keys are encoded as CP-1252.If you can't fix at the source, you'll need to do some post-processing.
(Some of the content of this answer assumes you're on python 2, there are differences in unicode handling in py3)