https proxy support in python requests library

2019-04-08 16:00发布

I am using the python Requests library to do HTTP related stuff. I set a proxy server using free ntlmaps on my computer to act as a proxy to answer the NTLM challenges from corporate ISA server. However, the response seems always to be empty, as shown below:

>>> import requests
>>> r = requests.get('https://www.google.com')
>>> r.text
u'<HTML></HTML>\r\n'

There is no such problem in the http request though. And, when I am using urllib2 library, it can get the correct response. I compared the message difference between using 'Requests' and 'urllib2' library, and found that 'Requests' uses 'GET' while 'urllib2' uses 'CONNECT', as shown in below raw message captured(The first is 'Requests' library). Does anybody knows if there is any solution? Is it a bug of 'Requests' library? Thanks in advance.

22.10.2012 11:01:41 Version 0.9.9.0.1
*** Got client request header.
*** Client header:
=====
GET https://www.google.com/ HTTP/1.1
Host: www.google.com
Proxy-Connection: Keep-Alive
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/0.14.1 CPython/2.7.2 Darwin/12.1.0

*** Client request header does not have 'Content-Length' or 'Transfer-Encoding' parameter and it must not have any body.
*** Replacing values in client header...Done.
*** New client header:
=====
GET https://www.google.com/ HTTP/1.1
Host: www.google.com
Proxy-Connection: Keep-Alive
Accept-Encoding: gzip, deflate, compress
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

*** Connecting to remote server...(10.220.15.36:9000)...Done.
*** Sending client request header to remote server...Done.
*** Got remote server response header.
*** Remote server header:
=====
HTTP/1.0 200 OK
Content-Type: text/html
Refresh: 0; URL=https://www.google.com/

*** Could not find server 'Content-Length' parameter.
*** Authentication routine started.
*** Authentication not required.
*** Authentication routine finished.
*** Sending remote server response header to client...Done.
*** Sent 15 bytes to client. (all - 0, len - 0)
*** Remote server closed connection. (Server buffer - 0 bytes)
*** No server's data to send to the client. (server's buffer - 0 bytes)
*** Termination conditions detected (remote server closed connection). Stop Request issued.
*** Finishing procedure started.
*** Closing thread...Done.

The message sent from 'urllib2' library:

22.10.2012 11:03:49 Version 0.9.9.0.1
*** Got client request header.
*** Client header:
=====
CONNECT www.google.com:443 HTTP/1.0

*** Client request header does not have 'Content-Length' or 'Transfer-Encoding' parameter and it must not have any body.
*** Replacing values in client header...Done.
*** New client header:
=====
CONNECT www.google.com:443 HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

*** Connecting to remote server...(10.220.15.36:9000)...Done.
*** Sending client request header to remote server...Done.
*** Got remote server response header.
*** Remote server header:
=====
HTTP/1.1 407 Proxy Authentication Required ( The ISA Server requires authorization to fulfill the request. Access to the Web Proxy service is denied. )
Via: 1.1 LASISA2
Proxy-Authenticate: Negotiate
Proxy-Authenticate: Kerberos
Proxy-Authenticate: NTLM
Connection: close
Proxy-Connection: close
Pragma: no-cache
Cache-Control: no-cache
Content-Type: text/html
Content-Length: 718

*** Server 'Content-Length' found to be 718.
*** Authentication routine started.
*** Got Error 407 - "Proxy authentication required".
*** Authentication methods allowed: Negotiate, Kerberos, NTLM
*** Using NTLM authentication method.
*** Authorization in progress...
*** Closing connection to the remote server...Done.
*** Building environment for NTLM.
*** Using custom NTLM flags: 06820000
*** NTLM version with LM response only.
*** NTLM Domain/Host/User: IGTMASTER/BEATLES.LOCAL/TFSBVTVA
*** NTLM hashed passwords found.
*** Environment has been built successfully.
*** Connecting to remote server...(10.220.15.36:9000)...Done.
*** Resetting remote server status...Done. (Server buffer - 651 bytes)
*** Remote server buffer flushed.
*** Fake NTLM header with Msg1:
=====
CONNECT www.google.com:443 HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Proxy-Connection: Keep-Alive
Proxy-Authorization: NTLM TlRMTVNTUAABAAAABoIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAAAAAAwAAAA

*** Sending Fake NTLM header with Msg1...Done.
*** There must be no body to send.
*** Waiting for message 2 from remote server...
*** Got remote server response header.
*** Remote server header:
=====
HTTP/1.1 407 Proxy Authentication Required ( Access is denied. )
Via: 1.1 LASISA2
Proxy-Authenticate: NTLM TlRMTVNTUAACAAAACQAJADgAAAAGgoECnmQdttSFW6oAAAAAAAAAAJAAkABBAAAABQLODgAAAA9JR1RNQVNURVICABIASQBHAFQATQBBAFMAVABFAFIAAQAOAEwAQQBTAEkAUwBBADIABAAaAGkAcwAuAGEAZAAuAGkAZwB0AC4AYwBvAG0AAwAqAGwAYQBzAGkAcwBhADIALgBpAHMALgBhAGQALgBpAGcAdAAuAGMAbwBtAAUAFABhAGQALgBpAGcAdAAuAGMAbwBtAAAAAAA=
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Pragma: no-cache
Cache-Control: no-cache
Content-Type: text/html
Content-Length: 0

*** Server 'Content-Length' found to be 0.
*** Got NTLM message 2 from remote server.
*** Resetting remote server status...Done. (Server buffer - 0 bytes)
*** Remote server buffer flushed.
*** Sending Fake NTLM header (not body) with Msg3...Done.
*** Fake NTLM header with Msg3:
=====
CONNECT www.google.com:443 HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Proxy-Authorization: NTLM TlRMTVNTUAADAAAAGAAYAF4AAAAAAAAAdgAAAAkACQBAAAAACAAIAEkAAAANAA0AUQAAAAAAAAB2AAAABoIAAElHVE1BU1RFUlRGU0JWVFZBQkVBVExFUy5MT0NBTMwaDvCTdLkOsE7vD6Tog1RoolpOLnh4WQ==

*** End of NTLM authorization process.
*** Authentication routine finished.
*** Got remote server response header.
*** Remote server header:
=====
HTTP/1.1 200 Connection established
Proxy-Connection: close
Connection: close
Via: 1.1 LASISA2

*** Remote server response to the 'CONNECT' request. It must not have any body.
*** Authentication routine started.
*** Authentication not required.
*** Authentication routine finished.
*** Sending remote server response header to client...Done.
*** Lowered authentication flags down. As the code is neither 401 nor 407.
*** Successful 'CONNECT' request detected. Going to tunnel mode.
*** Resetting client status...Done. (Client buffer - 114 bytes)
*** Resetting remote server status...Done. (Server buffer - 0 bytes)
*** Request completed.
*** Tunnelled 114 bytes to remote server.
*** Tunnelled 1725 bytes to client.
*** Tunnelled 186 bytes to remote server.
*** Tunnelled 47 bytes to client.
*** Tunnelled 142 bytes to remote server.
*** Tunnelled 4096 bytes to client.
*** Tunnelled 248 bytes to client.
*** Tunnelled 2076 bytes to client.
*** Tunnelled 4096 bytes to client.
*** Tunnelled 1198 bytes to client.
*** Remote server closed connection. (Server buffer - 0 bytes)
*** Termination conditions detected (remote server closed connection). Stop Request issued.
*** Finishing procedure started.
*** Closing thread...Done.

2条回答
做自己的国王
2楼-- · 2019-04-08 16:47

https proxy should be using "CONNECT" It is intentional that urllib2 does it that way. CONNECT establishes the tunnel for secure transmission which is required for HTTPS.

查看更多
我只想做你的唯一
3楼-- · 2019-04-08 16:53

As I understand it this is a bug in urllib3 which requests uses under the hood. See this bug report: https://github.com/shazow/urllib3/issues/50

查看更多
登录 后发表回答