Submitting a post request to an aspx page

2019-01-26 07:55发布

问题:

I have an ASPX page at https://searchlight.cluen.com/E5/CandidateSearch.aspx with a form on it, that I'd like to submit and parse for information.

Using Python's urllib and urllib2 I created a post request with the proper headers and user agent. But the resulting html response does not contain the expected table of results. Am I misunderstanding or am I missing any obvious details?

    import urllib
    import urllib2

    headers = {
        'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)         Gecko/2009073022 Firefox/3.0.13',
        'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    # obtained these values from viewing the source of https://searchlight.cluen.com/E5/CandidateSearch.aspx
    viewstate = '/wEPDwULLTE3NTc4MzQwNDIPZBYCAg ... uJRWDs/6Ks1FECco='
    eventvalidation = '/wEWjQMC8pat6g4C77jgxg0CzoqI8wgC3uWinQQCwr/ ... oPKYVeb74='
    url = 'https://searchlight.cluen.com/E5/CandidateSearch.aspx'
    formData = (
        ('__VIEWSTATE', viewstate),
        ('__EVENTVALIDATION', eventvalidation),
        ('__EVENTTARGET',''),
        ('__EVENTARGUMENT',''),
        ('textcity',''),
        ('dropdownlistposition',''),
        ('dropdownlistdepartment',''),
        ('dropdownlistorderby',''),
        ('textsearch',''),
    )

    # change user agent
    from urllib import FancyURLopener
    class MyOpener(FancyURLopener):
        version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127         Firefox/2.0.0.11'

    myopener = MyOpener()

    # encode form data in post-request format
    encodedFields = urllib.urlencode(formData)

    f = myopener.open(url, encodedFields)
    print f.info()

    try:
      fout = open('tmp.htm', 'w')
    except:
      print('Could not open output file\n')

    fout.writelines(f.readlines())
    fout.close()

There are several questions on this topic that were helpful (such as how to submit query to .aspx page in python) but I'm stuck on this and asking for additional help, if that is possible.

The resulting html page is saying I may need to log in, but the aspx page displays in my browser without any login.

Here are the results from info():

Connection: close Date: Tue, 07 Jun 2011 17:05:26 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 1944

回答1:

ASP.Net uses a security feature that protects against tampering with the ViewState by embedding specific information in it.

More than likely, the server is rejecting your request because the ViewState is being treated as though it were tampered with. I can't say this with absolute certainty, but ASP.Net has several security features that are built in to the framework that may be preventing a direct post.

If session is involved at all, then you will also need to take that into account. To simulate what the browser is doing you will need to perform the following steps:

  1. Request the page.
  2. Save the collection of cookies to a variable.
  3. Extract the ViewState to a variable.
  4. Post with the appropriate form values, passing both the saved cookies and ViewState information along with the request.

A lot of work I know, but not too awfully difficult. Again, this may not be the sole source of your problems, but it is worth reading up on in order to start troubleshooting.



回答2:

I tried mechanize and urllib2, and mechanize handles cookies better. I can submit the form simply by specifying with mechanize:

    browser= mechanize.Browser()
    browser.select_form(form_name)
    browser.set_value("Page$Next", name="pagenumber")     

It was not necessary to replicate the post request manually, and mechanize in this case was able to handle a form that relies on javascript.