I am attempting to write code that will allow my nodeJS server to scrape another website that is attempting to run ntlm authentication. When I arrive on the page a popup appears, into which I am supposed to enter credentials to access the site. However, that popup does not appear to be a part of chrome but a part of the OS; No matter what I try I can't find it in the elements of the page, which makes me think it is a part of the OS. (I am running Windows 10 on my development machine, but the machine that will execute the scrape is running linux). Another evidence I have for this is that when I use nightmare to land on the site the popup does not appear.
I found that this popup is an attempt to do ntlm authentication, i found this out by logging the headers and found the 'www-authenticate': 'Negotiate, NTLM'
header.
I have never done ntlm authentication before, and did lots of research consulting many articles, including:
I may just be misunderstanding it, but from what I read. NTLM authentication is an authentication protocol where a server and a client communicate several times requiring the client to decode a challenge that the server sends to it, and the client sending back a response with it's own encrypted message (in the documentation on microsoft they are referred to as ntlm type 1, type 2, and type 3 messages). Once the client has conducted these successive "handshakes" the client is given a token, which is to be placed in the authorization header of all future requests, which will allow the client to access resources at that domain.
But, when I make a get request (using the request module) and inspect the www-authenticate header, I am not seeing a base64 encrypted challenge that I need to decode. All it says is 'Negotiate, NTLM'. There is another header however that has been sent back that looks like this 'set-cookie': ['PMPRSTTCKT=!lcWXt/hXXO4xZHh0zm3oec8PLsnWcoTFk3sxytyUFAh/vYSo90MBtWpKI48G5L7mFdWMteNN5Q2Khfo=; expires=Fri, 01-Mar-2019 21:06:12 GMT; path=/; Httponly; Secure']
. I am not certain if this cookie is mistakenly the challenge sent back from the server, and it was just poor coding by the Development team or not.
As I understand it this is not the expected behavior for ntlm authentication. I am expecting this header to contain the challenge. Do any of you have experience with NTLM authentication and can shed some light on what is going on here?
The goal is to use nightmare to scrape the site and download several reports and run some analytics on the reports that are downloaded. There are several questions that need to be answered before this can be done:
- Obviously I need to authenticate somehow with the site
- Somehow allow nightmare access to the authentication token, or do something within nightmare or electron to do ntlm authentication so that my scraper can go through the site uninterrupted.
You can use the following module which already implements the ntlm authentication protocol in javascript/node.js: https://www.npmjs.com/package/httpntlm