Browsing a NTLM protected website using python wit

2019-06-25 23:22发布

I have been tasked with creating a script that logs on to a corporate portal goes to a particular page, downloads the page, compares it to an earlier version and then emails a certain person depending on changes that have been made. The last parts are easy enough but it has been the first step that is giving me the most trouble.

After unsuccessfully using urllib2(I am trying to do this in python) to connect and about 4 or 5 hours of googling I have determined that the reason I can't connect is due to NTLM authentication on the web page. I have tried a bunch of different processes for connecting found on this site and others to no avail. Based on the NTLM example I have done:

import urllib2
from ntlm import HTTPNtlmAuthHandler

user = 'username'
password = "password"
url = "https://portal.whatever.com/"

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)

# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}

response = urllib2.urlopen(urllib2.Request(url, None, header))

When I run this (with a real username, password and url) I get the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ntlm2.py", line 21, in <module>
    response = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 438, in error
     return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
  urllib2.HTTPError: HTTP Error 401: Unauthorized

The thing that is most interesting about this trace to me is that the final line says a 401 error was sent back. From what I have read the 401 error is the first message sent back to the client when NTLM is started. I was under the impression that the purpose of python-ntml was to handle the NTLM process for me. Is that wrong or am I just using it incorrectly? Also I'm not bounded to using python for this, so if there is an easier way to do this in another language let me know (From what I seen a-googling there isn't). Thanks!

标签: python ntlm
1条回答
唯我独甜
2楼-- · 2019-06-25 23:43

If the site is using NTLM authentication, the headers attribute of the resulting HTTPError should say so:

>>> try:
...   handle = urllib2.urlopen(req)
... except IOError, e:
...   print e.headers
... 
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
查看更多
登录 后发表回答