Automate website log-in and form filling?

2020-02-08 09:54发布

问题:

I'm trying to log in to a website and save an HTML page automatically (I want to be able to do this on a regular time interval). From the surface, this is a typical modern website where, if the user navigates directly to a "locked" URL, a log-in form pops up, and after logging in, the user is redirected to the intended page.

I gave mechanize a shot (http://wwwsearch.sourceforge.net/mechanize/) but it wasn't finding some form elements which were needed for login (hidden elements that have some values put in by a javascript function that runs when the user clicks the "log in" button).

I played a bit with the "web browser" control in .NET but quickly lost interest because I couldn't even get it to submit a query on the Google page.

I don't care what the language is; I'll learn it to solve this problem. At a minimum it has to work in Windows.

A simple example, say, typing in a query into the Google search box would be a great bonus.

回答1:

In my experience, the most reliable way is to use javascript. It works well in .Net. To test, browse to the following addresses one after another in Firefox or Internet Explorer:

http://www.google.com
javascript:function f(){document.forms[0]['q'].value='stackoverflow';}f();
javascript:document.forms[0].submit()

That performs a search for "stackoverflow" on Google. To do it in VB .Net using the webbrowser control, do this:

WebBrowser1.Navigate("http://www.google.com")
Do While WebBrowser1.IsBusy OrElse WebBrowser1.ReadyState <> WebBrowserReadyState.Complete
    Threading.Thread.Sleep(1000)
    Application.DoEvents()
Loop
WebBrowser1.Navigate("javascript:function%20f(){document.forms[0]['q'].value='stackoverflow';}f();")
Threading.Thread.Sleep(2000) 'wait for javascript to run
WebBrowser1.Navigate("javascript:document.forms[0].submit()")
Threading.Thread.Sleep(2000) 'wait for javascript to run

Notice how the space in the URL is converted to %20. I'm not certain if this is necessary but it can't hurt. It is important that the first javascript be in a function. The calls to Sleep() are to wait for Google to load and also for the javascript stuff. The Do While Loop might run forever if the page fails to load so for automation purposes have a counter that will timeout after, say, 60 seconds.

Of course, for Google you can just navigate directly to www.google.com?q=stackoverflow but if your site has hidden input fields, etc, then this is the way to go. Only works for HTML sites - flash is a whole other matter.



回答2:

If I understand you right, you want to log in to only one webpage, and that form always stays the same. You could either reverse engineer the java script, or debug it via a javascript debugger in the browser (e.g. firebug for firefox). Or you can fill in the form in your browser and look at the http request via a network packet sniffer. Once you have all required form data to submit, you can do the same with your program (thats what I did the last time I had a pretty similar task to do). dont forget to store all cookie data you requested back from the webserver and send it with the next request, to 'stay logged in'.



回答3:

Its being already discussed here.

Basically its gist is you can use selenium, an open source web automation tool, which has api library available in various languages like java, ruby, etc.



回答4:

Neoload can handle the form filling with authentication, assuming you don't want to collect data, just perform actions. It's a web stress tool, so it's not really meant to be used as a time-based service, but you COULD just leave it running.



回答5:

I've used Ruby and Watir (a web app testing suite) for something similar, but it was a very small task (basically visiting URLs from a text file and downloading an image).

There's also an extension called iMacros that can do some automation, but I'm not personally familiar with it (just aware of it).



回答6:

"I'm trying to log in to a website and save an HTML page automatically"

 SAVEAS TYPE=HTM FOLDER=C: FILE=page.html

https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/?src=search

This commands played in iMacros addon will save the page on C: drive and name it page.html

Also,

URL GOTO=www.website.com

Goes on the particular website you want to save. You can also use scripting in iMacros and set different websites in macro.