Requests, Mechanize, urllib fails but cURL works

2019-03-22 16:46发布

问题:

Whilst attempting to access this site through requests, I receive:

('Connection aborted.', error(54, 'Connection reset by peer'))

I have also tried to access the site through mechanize and urllib, both failed. However cURL works fine (see end for code).

I have tried requests.get() with combinations of parameters verify=True,stream=True and I have also tried a request with the cURL header.

I tried to move to urllib / Mechanize as alternatives but both gave the same error.

My code for requests is as follows:

import requests
import cookielib

url = "https://datamuster.marketdatasuite.com/Account/LogOn?ReturnUrl=%2fProfile%2fList"

header = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding':'gzip,deflate,sdch',
    'Accept-Language':'en-US,en;q=0.8',
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36'
}

jar = cookielib.CookieJar()
s = requests.Session()
s.headers.update(header)

r = s.get(url, cookies=jar)

cURL test with headers:

$ curl -v -I -H "....Testing: Header...." https://datamuster.marketdatasuite.com/Account/LogOn?ReturnUrl=%2fProfile%2fList

* Hostname was NOT found in DNS cache
*   Trying 54.252.86.7...
* Connected to datamuster.marketdatasuite.com (54.252.86.7) port 443 (#0)
* TLS 1.2 connection using TLS_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: datamuster.marketdatasuite.com
* Server certificate: COMODO SSL CA
* Server certificate: AddTrust External CA Root
> HEAD /Account/LogOn?ReturnUrl=%2fProfile%2fList HTTP/1.1
> User-Agent: curl/7.37.1
> Host: datamuster.marketdatasuite.com
> Accept: */*
> ....Testing: Header....
> 
< HTTP/1.1 200 OK

回答1:

The server requires the use of SNI and just closes the connection if no SNI is used. Looks like curl uses SNI, while at least the version of the requests library you are using does not use SNI.

You can try this with OpenSSL. Without SNI you get an error:

$ openssl s_client -connect datamuster.marketdatasuite.com:443
CONNECTED(00000003)
write:errno=104

But if you use SNI (-servername ...) then it works:

CONNECTED(00000003)
depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO SSL CA
...
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : AES128-SHA256

According to the FAQ for request SNI is not support with Python 2, only with Python 3. See this resource for information on how to make SNI possible with Python 2.



回答2:

This process wasn't totally straightforward so I thought I'd post a new answer to make it easy to follow for others.

Following this thread, I needed to install these libraries get SNI to work with Python 2:

  • pyOpenSSL
  • ndg-httpsclient
  • pyasn1

However, pyOpenSSL may cause problems when installed with pip install pyOpenSSL. I actually had to remove my existing openssl, since pyOpenSSL version 0.14 didn't seem to work:

pip uninstall pyOpenSSL

The following command installed all necessary dependencies:

pip install pyOpenSSL==0.13 ndg-httpsclient pyasn1

This should get requests to now work with SNI on python 2.


Keep reading for the issues with pyOpenSSL ver. 0.14...

When installing ver. 0.14 I get the following error:

Command /usr/local/opt/python/bin/python2.7 -c "import setuptools, tokenize;__file__='/private/var/folders/04/3f_y5fw166v03k7b51j1tsl80000gn/T/pip_build_alex/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/04/3f_y5fw166v03k7b51j1tsl80000gn/T/pip-7QR71B-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /private/var/folders/04/3f_y5fw166v03k7b51j1tsl80000gn/T/pip_build_alex/cryptography
Storing debug log for failure in /Users/alex/.pip/pip.log

and pyOpenSSL installs as ver. 0.14 incompletely:

$ pip show pyOpenSSL
---
Name: pyOpenSSL
Version: 0.14
Location: /usr/local/lib/python2.7/site-packages
Requires: cryptography, six

as can be seen from the requests.get() attempt:

import requests
response = requests.get("http://datamuster.marketdatasuite.com")

(...lots of errors...)
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(54, 'Connection reset by peer'))

The following commands revert to pyOpenSSL ver. 0.13 and correct the issue

pip uninstall pyOpenSSL
pip install pyOpenSSL==0.13

and then in python:

import requests
requests.get("http://datamuster.marketdatasuite.com")

<Response [200]>


回答3:

I got the same issues when using Python requests to send testing requests to WordPress (I installed WordPress on a dedicate server). I tried to update SSL- packages without success.

Then, I realised that the requests sent to server got delays in receiving response. The long delays requests were always "kicked-off" and caused ('Connection aborted.', error(54, 'Connection reset by peer')). It turned out that the web server (apache) resets the connection while the request is still waiting for response.

I increase the KeepAliveTimeout from 5 seconds to 20 seconds (in Apache web server) and never get this error again.

Improve code for Exceptions: Increasing KeepAliveTimeout works in most of the tests. However, in some tests, I still got the same error and program stops. I add the code that catches Exception and repeat the request it occurs.

import requests
...
while(1):
    requestOK = True
    try:
       r = session.get(requestURL, headers=headers, timeout=None)
    except requests.exceptions.ConnectionError: 
       print ("'Connection aborted.', error(54, 'Connection reset by peer')")
       print ("\tResend request...")
       requestOK = False
    if requestOK:
       break

Hope this will help!