(Python 3.4.2)
Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.
Here's what I'm trying to do (pretty basic):
import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()
Here's my error output when I run it:
File "./script.py", line 124, in <module>
response = urllib.request.urlopen(url)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
'unknown_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>
I've also tried using data=None to no avail:
response = urllib.request.urlopen(url, data=None)
I've also tried this:
import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)
A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.
Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.
Anyone know what's going on?
-----EDIT-----
I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!
Double check your compilation options, looks like something is wrong with your box.
At least the following code works for me:
from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
urllib.error.URLError: <urlopen error unknown url type: 'https>
The 'https
and not https
in the error message indicates that you did not try a http://
request but instead a 'https://
request which of course does not exist. Check how you construct your URL.
this may help
Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
I had the same error when I tried to open a url with https, but no errors with http.
>>> from urllib.request import urlopen
>>> urlopen('http://google.com')
<http.client.HTTPResponse object at 0xb770252c>
>>> urlopen('https://google.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 548, in _open
'unknown_open', req)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1387, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>
This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:
>>> import ssl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/ssl.py", line 98, in <module>
import _ssl # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'
By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.
$ ./configure
$ make
$ make install
Now it is fixed:
>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.0.2g 1 Mar 2016'
>>> urlopen('https://www.google.com')
<http.client.HTTPResponse object at 0xb74c4ecc>
>>> urlopen('https://www.google.com').read()
b'<!doctype html>...
and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.