I need to parse site, which is hidden by ADFS service.
and struggling with authentication to it.
Is there any options to get in?
what i can see, most of solutions for backend applications, or for "system users"(with app_id, app_secret). in my case, i can't use it, only login and password.
example of problem:
in chrome
I open www.example.com
and it redirects me to to https://login.microsoftonline.com/
and then to https://federation-sts.example.com/adfs/ls/?blabla
with login and password form.
and how to get access into it with python3
?
ADFS uses complicated redirection and CSRF protection techniques. Thus, it is better to use a browser automation tool to perform the authentication and parse the webpage afterwards. I recommend the
selenium
toolkit with python bindings. Here is a working example:This script calls Microsoft Edge to open the website. It injects the username and password to the correct DOM elements and then let the browser to handle the rest. It has been tested on the webpage "https://login.microsoftonline.com". You may need to modify it to suit your website.
To Answer your question "How to Get in with python" i am assuming you want perform some web scrapping operation on the pages which is secured by Azure AD authentication.
In these kind of scenario, you have to do the following steps.
1) For this script we will only need to import the following:
First, we would like to create our session object. This object will allow us to persist the login session across all our requests.
Second, we would like to extract the csrf token from the web page, this token is used during login. For this example we are using lxml and xpath, we could have used regular expression or any other method that will extract this data.
Next, we would like to perform the login phase. In this phase, we send a POST request to the login url. We use the payload that we created in the previous step as the data. We also use a header for the request and add a referer key to it for the same url.
Payload would be a dictionary object of user name and password etc.
Note:- This is just an example.
Step 2:
Scrape content
Now, that we were able to successfully login, we will perform the actual scraping
So in other words, you need to get the request details payload from Azure AD and then create a session object using logged in method and then finally do the scrapping.
Here is a very good example of Web scrapping of a secured website.
Hope it helps.