How to send utf-8 content in a urllib2 request?

2020-07-30 00:40发布

问题:

I'm struggling with the following question for the past half a day and although I've found some info about similar problems, nothing really hits the spot.

I'm trying to send a PUT request using urllib2 with data that contains some Unicode characters:

body = u'{ "bbb" : "asdf\xd7\xa9\xd7\x93\xd7\x92"}'
conn = urllib2.Request(request_url, body, headers)
conn.get_method = lambda: 'PUT'
response = urllib2.urlopen(conn)

I've tried to use body = body.encode('utf-8') and other variations, but whatever I do I get the following error:

UnicodeEncodeError at ...
'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

With one of the following call stacks:

File "..." in ...
  195.         response = urllib2.urlopen(conn)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in urlopen
  126.     return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in open
  394.         response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _open
  412.                                   '_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _call_chain
  372.             result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in http_open
  1199.         return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in do_open
  1168.             h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in request
  955.         self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_request
  989.         self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in endheaders
  951.         self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_output
  815.             self.send(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in send
  787.             self.sock.sendall(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py" in meth
  224.     return getattr(self._sock,name)(*args)

Or the following call stack (for when I do body = body.encode('utf-8')):

File "..." in ...
  195.         response = urllib2.urlopen(conn)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in urlopen
  126.     return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in open
  394.         response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _open
  412.                                   '_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _call_chain
  372.             result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in http_open
  1199.         return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in do_open
  1168.             h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in request
  955.         self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_request
  989.         self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in endheaders
  951.         self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_output
  809.             msg += message_body

What am I doing wrong? How can I send a body with Unicode characters via urllib2? If there are no Unicode characters, everything works fine.

Also note that my Content-Type header is set to application/json;charset=utf-8.

If it's relevant in any way, the context of what I'm doing is this: I'm getting a request to my Django server, and I delegate the request to another Django server. I don't redirect, just send the request from my own server get the response and send it back. So body is the request.body in the Django view.

Edit:

My headers are:

{
'Origin': 'http://10.0.0.146:8000', 
'Accept-Language': 'en-US,en;q=0.8', 
'Accept-Encoding': 'gzip,deflate,sdch', 
'Host': 'localhost:5000', 
'Accept': 'application/json, text/plain, */*', 
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31', 
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 
'Connection': 'keep-alive', 
'X-Requested-With': 'XMLHttpRequest', 
'Pragma': 'no-cache', 
'Cache-Control': 'no-cache', 
'Referer': 'http://localhost:5000/', 
'Content-Type': 'application/json;charset=UTF-8', 
'Authorization': 'ApiKey ogkLPgSESNyTOgIdbSLDhJjvyVJcbg:0d5897b5204c2f2527f532c6a97ba18a7f06acdc', 
'Cookie': 'username=ogkLPgSESNyTOgIdbSLDhJjvyVJcbg; _we_wk_ls_=%7B%22time%22%3A1369123506709%7D; __jwpusr=39e63770-ec5c-4b96-9f7f-b199703d0d36; sessionid=0d741a7560258b301979a1c853b42a81; api_key=0d5897b5204c2f2527f532c6a97ba18a7f06acdc'
}

回答1:

You need to pass only byte strings to Request. This applies to the headers, the url and the body.

If any of those three inputs contain Unicode values, automatic conversions between Unicode and strings will take place when concatenating, which will invariably lead to grief.