I have a dictionary data
where I have stored:
key
- ID of an eventvalue
- the name of this event, wherevalue
is a UTF-8 string
Now, I want to write down this map into a json file. I tried with this:
with open('events_map.json', 'w') as out_file:
json.dump(data, out_file, indent = 4)
but this gives me the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte
Now, I also tried with:
with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
out_file.write(unicode(json.dumps(data, encoding="utf-8")))
but this raises the same error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte
I also tried with:
with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
out_file.write(unicode(json.dumps(data, encoding="utf-8", ensure_ascii=False)))
but this raises the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbf in position 3114: ordinal not in range(128)
Any suggestions about how can I solve this problem?
EDIT: I believe this is the line that is causing me the problem:
> data['142']
'\xbf/ANCT25'
EDIT 2:
The data
variable is read from a file. So, after reading it from a file:
data_file_lines = io.open(file_name, 'r', encoding='utf8').readlines()
I then do:
with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
json.dump(data, json_file, ensure_ascii=False)
Which gives me the error:
TypeError: must be unicode, not str
Then, I try to do this with the data dictionary:
for tuple in sorted_tuples (the `data` variable is initialized by a tuple):
data[str(tuple[1])] = json.dumps(tuple[0], ensure_ascii=False, encoding='utf8')
which is, again, followed by:
with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
json.dump(data, json_file, ensure_ascii=False)
but again, the same error:
TypeError: must be unicode, not str
I get the same error when I use the simple open
function for reading from the file:
data_file_lines = open(file_name, "r").readlines()
The exception is caused by the contents of your
data
dictionary, at least one of the keys or values is not UTF-8 encoded.You'll have to replace this value; either by substituting a value that is UTF-8 encoded, or by decoding it to a
unicode
object by decoding just that value with whatever encoding is the correct encoding for that value:to decode that string as a Latin-1-encoded value instead.