I have an ASPX page at https://searchlight.cluen.com/E5/CandidateSearch.aspx with a form on it, that I'd like to submit and parse for information.
Using Python's urllib and urllib2 I created a post request with the proper headers and user agent. But the resulting html response does not contain the expected table of results. Am I misunderstanding or am I missing any obvious details?
import urllib
import urllib2
headers = {
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'
}
# obtained these values from viewing the source of https://searchlight.cluen.com/E5/CandidateSearch.aspx
viewstate = '/wEPDwULLTE3NTc4MzQwNDIPZBYCAg ... uJRWDs/6Ks1FECco='
eventvalidation = '/wEWjQMC8pat6g4C77jgxg0CzoqI8wgC3uWinQQCwr/ ... oPKYVeb74='
url = 'https://searchlight.cluen.com/E5/CandidateSearch.aspx'
formData = (
('__VIEWSTATE', viewstate),
('__EVENTVALIDATION', eventvalidation),
('__EVENTTARGET',''),
('__EVENTARGUMENT',''),
('textcity',''),
('dropdownlistposition',''),
('dropdownlistdepartment',''),
('dropdownlistorderby',''),
('textsearch',''),
)
# change user agent
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()
# encode form data in post-request format
encodedFields = urllib.urlencode(formData)
f = myopener.open(url, encodedFields)
print f.info()
try:
fout = open('tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
There are several questions on this topic that were helpful (such as how to submit query to .aspx page in python) but I'm stuck on this and asking for additional help, if that is possible.
The resulting html page is saying I may need to log in, but the aspx page displays in my browser without any login.
Here are the results from info():
Connection: close Date: Tue, 07 Jun 2011 17:05:26 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 1944