I'm trying to get a response from urllib
and decode it
to a readable format. The text is in Hebrew and also contains characters like {
and /
top page coding is:
# -*- coding: utf-8 -*-
raw string is:
b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
Now I'm trying to decode it using:
data = data.decode()
and I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I got this error in
Django
withPython 3.4
. I was trying to get this to work with django-rest-framework.This was my code that fixed the error UnicodeDecodeError: 'utf-8' codec can't decode byte error.
This is the passing test:
Your problem is that that is not UTF-8. You have UTF-16 encoded data, decode it as such:
If you loaded this from a website with
urllib.request
, theContent-Type
header should contain acharset
parameter telling you this; ifresponse
is the returnedurllib.request
response object, then use:This defaults to UTF-8 when no
charset
parameter has been set, which is the appropriate default for JSON data.Alternatively, use the
requests
library to load the JSON response, it handles decoding automatically (including UTF-codec autodetection specific to JSON responses).One further note: the PEP 263 source code codec comment is used only to interpret your source code, including string literals. It has nothing to do with encodings of external sources (files, network data, etc.).