Programmatic Form Submit

2019-02-19 16:27发布

I want to scrape the contents of a webpage. The contents are produced after a form on that site has been filled in and submitted.

I've read on how to scrape the end result content/webpage - but how to I programmatically submit the form?

I'm using python and have read that I might need to get the original webpage with the form, parse it, get the form parameters and then do X?

Can anyone point me in the rigth direction?

4条回答
贪生不怕死
2楼-- · 2019-02-19 16:42

From a similar question - options-for-html-scraping - you can learn that with Python you can use Beautiful Soup.

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:

  1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
  2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
  3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.

The unusual name caught the attention of our host, November 12, 2008.

查看更多
可以哭但决不认输i
3楼-- · 2019-02-19 17:01

Using python, I think it takes the following steps:

  1. parse the web page that contains the form, find out the form submit address, and the submit method ("post" or "get").

this explains form elements in html file

  1. Use urllib2 to submit the form. You may need some functions like "urlencode", "quote" from urllib to generate the url and data for post method. Read the library doc for details.
查看更多
别忘想泡老子
4楼-- · 2019-02-19 17:01

you'll need to generate a HTTP request containing the data for the form.

The form will look something like:

<form action="submit.php" method="POST"> ... </form>

This tells you the url to request is www.example.com/submit.php and your request should be a POST.

In the form will be several input items, eg:

<input type="text" name="itemnumber"> ... </input>

you need to create a string of all these input name=value pairs encoded for a URL appended to the end of your requested URL, which now becomes www.example.com/submit.php?itemnumber=5234&otherinput=othervalue etc... This will work fine for GET. POST is a little trickier.

</motivation>

Just follow S.Lott's links for some much easier to use library support :P

查看更多
Bombasti
5楼-- · 2019-02-19 17:08

You can do it with javascript. If the form is something like:

<form name='myform' ...

Then you can do this in javascript:

<script language="JavaScript">
function submitform()
{
document.myform.submit();
}
</script> 

You can use the "onClick" attribute of links or buttons to invoke this code. To invoke it automatically when a page is loaded, use the "onLoad" attribute of the element:

<body onLoad="submitform()" ...>
查看更多
登录 后发表回答