(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.
Here's what I'm trying to do (pretty basic):
import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()
Here's my error output when I run it:
File "./script.py", line 124, in <module>
response = urllib.request.urlopen(url)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
'unknown_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>
I've also tried using data=None to no avail:
response = urllib.request.urlopen(url, data=None)
I've also tried this:
import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)
A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.
Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.
Anyone know what's going on?
-----EDIT-----
I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!
I had the same error when I tried to open a url with https, but no errors with http.
This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:
By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.
Now it is fixed:
and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.
Double check your compilation options, looks like something is wrong with your box.
At least the following code works for me:
this may help
Ignore SSL certificate errors
The
'https
and nothttps
in the error message indicates that you did not try ahttp://
request but instead a'https://
request which of course does not exist. Check how you construct your URL.