Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
h = httplib2.Http('.cache')
response, content = h.request('http://www.someHTMLPageWithTwoForms.com')
for field in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
if field.has_key('name'):
print field['name']
This returns me all the field names that belong both to the Form 1 and Form 2 of my HTML page. Is there any way I can get only the Field names that belong to a particular form (say Form 2 only)?
Doing this kind of parsing would also be quite easy using
lxml
(which i personally prefer overBeautifulSoup
because of itsXpath
support). For example, the following snippet would print all fields names (if they have one) which belong to forms named "form2":If you have attribute name and value, you can search
from BeautifulSoup import BeautifulStoneSoup
xml = '<person name="Bob"><parent rel="mother" name="Alice">'
xmlSoup = BeautifulStoneSoup(xml)
xmlSoup.findAll(name="Alice")
# []
If you have lxml and cssselect python packages installed:
If it's only 2 forms you may try this one:
If it's not only about the 2nd form you make it more specific (by an id or class attributs