Following a javascript postback using COM + IE aut

2019-07-31 14:08发布

问题:

I want to automate the archiving of the data on this page http://energywatch.natgrid.co.uk/EDP-PublicUI/Public/InstantaneousFlowsIntoNTS.aspx, and upload into a database.

I have been using python and win32com (behind a corporate proxy, so no direct net access, hence I am using IE to do so) on other pages to do this. My question is that is there anyway to extract and save the CSV data that is returned when clicking the "Click here to download data" link at the bottom? This link is a javascript postback, and would be much easier than reformatting the page itself into CSV.

. Of course, I'm not necessarily committed to using Python if a simpler alternative can be suggested?

Thanks

回答1:

Here's a better way, using the mechanize library.


import mechanize

b = mechanize.Browser()
b.set_proxies({'http': 'yourproxy.corporation.com:3128' })

b.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)')]
b.open("http://energywatch.natgrid.co.uk/EDP-PublicUI/Public/InstantaneousFlowsIntoNTS.aspx")

b.select_form(name="form1")
b.form.find_control(name='__EVENTTARGET').readonly = False
b.form['__EVENTTARGET'] = 'a1'

print b.submit().read()

Note how you can specify that mechanize should use a proxy server (also possible using plain urllib). Also note how ASP.NETs javascript postback is simulated.

Edit:

If your proxy server is using NTLM authentication, that could be the problem. AFAIK urllib2 does not handle NTLM authentication. You could try the NTLM Authorization Proxy Server. From the readme file:


WHAT IS 'NTLM Authorization Proxy Server'?

'NTLM Authorization Proxy Server' is a proxy-like software, that will authorize you at MS proxy server and at web servers (ISS especially) using MS proprietary NTLM authorization method and it can change some values in your client's request header so that those requests will look like ones made by MS IE. It is written in Python language. See www.python.org.