I want to scrape the contents of a webpage. The contents are produced after a form on that site has been filled in and submitted.
I've read on how to scrape the end result content/webpage - but how to I programmatically submit the form?
I'm using python and have read that I might need to get the original webpage with the form, parse it, get the form parameters and then do X?
Can anyone point me in the rigth direction?
From a similar question - options-for-html-scraping - you can learn that with Python you can use Beautiful Soup.
The unusual name caught the attention of our host, November 12, 2008.
Using python, I think it takes the following steps:
this explains form elements in html file
you'll need to generate a HTTP request containing the data for the form.
The form will look something like:
This tells you the url to request is www.example.com/submit.php and your request should be a POST.
In the form will be several input items, eg:
you need to create a string of all these input name=value pairs encoded for a URL appended to the end of your requested URL, which now becomes www.example.com/submit.php?itemnumber=5234&otherinput=othervalue etc... This will work fine for GET. POST is a little trickier.
Just follow S.Lott's links for some much easier to use library support :P
You can do it with javascript. If the form is something like:
Then you can do this in javascript:
You can use the "onClick" attribute of links or buttons to invoke this code. To invoke it automatically when a page is loaded, use the "onLoad" attribute of the element: