URLDecoding requests

I am trying to get the original url from requests. Here is what I have so far:

res = requests.get(...)
url = urllib.unquote(res.url).decode('utf8')

I then get an error that says:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

The original url I requested is:

https://www.microsoft.com/de-at/store/movies/american-pie-pr\xc3\xa4sentiert-nackte-tatsachen/8d6kgwzl63ql

And here is what happens when I try printing:

>>> print '111', res.url
111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '222', urllib.unquote( res.url )
222 https://www.microsoft.com/de-at/store/movies/american-pie-prÃ¤sentiert-nackte-tatsachen/8d6kgwzl63ql
>>> print '333', urllib.unquote(res.url).decode('utf8') 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-61: ordinal not in range(128)

Why is this occurring, and how would I fix this?

UnicodeEncodeError: 'ascii' codec can't encode characters

You are trying to decode a string that is Unicode already. It raises AttributeError on Python 3 (unicode string has no .decode() method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding() ('ascii') before passing it to .decode('utf8') which leads to UnicodeEncodeError.

In short, do not call .decode() on Unicode strings, use this instead:

print urllib.unquote(res.url.encode('ascii')).decode('utf-8')

Without .decode() call, the code prints bytes (assuming a bytestring is passed to unquote()) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode() is necessary here.

There is a bug in urllib.unquote() if you pass it a Unicode string:

>>> print urllib.unquote(u'%C3%A4')
Ã¤
>>> print urllib.unquote('%C3%A4') # utf-8 output
ä

Pass bytestrings to unquote() on Python 2.

URLDecoding requests

问题:

回答1:

收藏的人(0)

URLDecoding requests

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮