Downloading HTTPS pages with urllib, error:1407743

2019-06-21 10:51发布

问题:

Im using newest Kubuntu with Python 2.7.6. I try to download a https page using the below code:

import urllib2

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'pl-PL,pl;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(main_page_url, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

However, I get such an error:

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    page = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error>

How to solve this?

SOLVED!

I used url https://www.ssllabs.com given by @SteffenUllrich. It turned out that the server uses TLS 1.2, so I updated python to 2.7.10 and modified my code to:

import ssl
import urllib2

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'pl-PL,pl;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(main_page_url, headers=hdr)

try:
    page = urllib2.urlopen(req,context=context)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Now it downloads the page.

回答1:

Im using newest Kubuntu with Python 2.7.6

The latest Kubuntu (15.10) uses 2.7.10 as far as I know. But assuming you use 2.7.6 which is contained in 14.04 LTS:

Works with facebook for me too, so it's probably the page issue. What now?

Then it depends on the site. Typical problems with this version of Python is missing support for Server Name Indication (SNI) which was only added to Python 2.7.9. Since lots of sites require SNI today (like everything using Cloudflare Free SSL) I guess this is the problem.

But, there are also other possibilities like multiple trust path which is only fixed with OpenSSL 1.0.2. Or simply missing intermediate certificates etc. More information and maybe also workarounds are only possible if you provide the URL or you analyze the situation yourself based on this information and the analysis from SSLLabs.