Logging in to a web site with Python (urllib,urlli

2019-07-20 16:11发布

问题:

Preface: I understand that there are many responses for similar questions such as this on stack overflow. However, I haven't found anything relating to aspx log ins, nor an exact case such as this.

Problem: I need to determine what information is necessary in order to log in to https://cableone.net/login.aspx in order to scrape information from there.

Progress: Thus far I have found input fields in the source of login.aspx and have scrapped together a script in python with urllib,urllib2,and cookielib. I ignored anythig that had a blank value in my script.

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"value="/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM" />
<input name="ctl00$plhMain$txtUserName" type="text" id="ctl00_plhMain_txtUserName" />
<input name="ctl00$plhMain$txtPassword" type="password" id="ctl00_plhMain_txtPassword" />
<input type="submit" name="ctl00$plhMain$btnLogin" value="Login" id="ctl00_plhMain_btnLogin" />

I then utilized the above input values with python and urllib in the following.

import urllib, urllib2, cookielib
from cookielib import CookieJar


url = 'https://myaccount.cableone.net/Login.aspx'

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
cookies = cookielib.CookieJar()

#determine what I need to change with these values 
formValues = {
    "__VIEWSTATE":"/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT",
    "__EVENTVALIDATION":"/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM",
    "ctl00$plhMain$txtUserName":"myAccount",
    "ctl00$plhMain$txtPassword":"myPassword"
    }

data = urllib.urlencode(formValues)

response = opener.open("https://myaccount.cableone.net/Login.aspx",data)
thePage = response.read()
httpheaders = response.info()
print thePage 

回答1:

The approach you outlined will be difficult if the form is dynamic in any way. A more universal way is to install Google Chrome Canary which has good developer tools, click "inspect page", then go to "Network" tab, and mark "Preserve log". (You may need the Canary version, because the regular one doesn't catch some of the data if I'm not mistaken)

With all this open, click "login", and you'll see all the requests and headers and POST data. This will give you all the POST data that is sent to the server.

Now, you can test the data in your script, and remove it one by one. Another option for testing the requests is to use Advanced REST Client, by the way.