I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.
import urllib2
import json
url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}
req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()
data = json.loads(response) # <-- Crashes
The last line throws an exception:
ValueError: No JSON object could be decoded
When I look at response
, I see valid JSON, but the first few characters are a BOM:
>>> response
'\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here
So, if I manually strip out the first three bytes:
data = json.loads(response[3::])
Everything works and response
is turned into a dictionary.
My Question:
It seems kinda silly that json
barfs when you give it a BOM. Is there anything different I can do with urllib
or the json
library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.
In case I'm not the only one who experienced the same problem, but is using
requests
module instead ofurllib2
, here is a solution that works in Python 2.6 as well as 3.3:You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.
That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM:
utf-8-sig
.So you just need:
Since I lack enough reputation for a comment, I'll write an answer instead.
I usually encounter that problem when I need to leave the underlying
Stream
of aStreamWriter
open. However, the overload that has the option to leave the underlyingStream
open needs an encoding (which will be UTF8 in most cases), here's how to do it without emitting the BOM.