I have been tasked with creating a script that logs on to a corporate portal goes to a particular page, downloads the page, compares it to an earlier version and then emails a certain person depending on changes that have been made. The last parts are easy enough but it has been the first step that is giving me the most trouble.
After unsuccessfully using urllib2(I am trying to do this in python) to connect and about 4 or 5 hours of googling I have determined that the reason I can't connect is due to NTLM authentication on the web page. I have tried a bunch of different processes for connecting found on this site and others to no avail. Based on the NTLM example I have done:
import urllib2
from ntlm import HTTPNtlmAuthHandler
user = 'username'
password = "password"
url = "https://portal.whatever.com/"
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}
response = urllib2.urlopen(urllib2.Request(url, None, header))
When I run this (with a real username, password and url) I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "ntlm2.py", line 21, in <module>
response = urllib2.urlopen(urllib2.Request(url, None, header))
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 432, in error
result = self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 432, in error
result = self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Unauthorized
The thing that is most interesting about this trace to me is that the final line says a 401 error was sent back. From what I have read the 401 error is the first message sent back to the client when NTLM is started. I was under the impression that the purpose of python-ntml was to handle the NTLM process for me. Is that wrong or am I just using it incorrectly? Also I'm not bounded to using python for this, so if there is an easier way to do this in another language let me know (From what I seen a-googling there isn't). Thanks!
If the site is using NTLM authentication, the headers attribute of the resulting HTTPError should say so: