I need to fetch data from a URL with non-ascii characters but urllib2.urlopen refuses to open the resource and raises:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 26: ordinal not in range(128)
I know the URL is not standards compliant but I have no chance to change it.
What is the way to access a resource pointed by a URL containing non-ascii characters using Python?
edit: In other words, can / how urlopen open a URL like:
http://example.org/Ñöñ-ÅŞÇİİ/
Use
iri2uri
method ofhttplib2
. It makes the same thing as by bobin (is he/she the author of that?)Based on @darkfeline answer:
It is more complex than the accepted @bobince's answer suggests:
This is how all browsers work; it is specified in https://url.spec.whatwg.org/ - see this example. A Python implementation can be found in w3lib (this is the library Scrapy is using); see w3lib.url.safe_url_string:
An easy way to check if a URL escaping implementation is incorrect/incomplete is to check if it provides 'page encoding' argument or not.
Encode the
unicode
to UTF-8, then URL-encode.works! finally
I could not avoid from this strange characters, but at the end I come through it.
In python3, use the
urllib.parse.quote
function on the non-ascii string: