I would like to parse a html file with Python, and the module I used is beautifulsoup.
After I used it, something strange happened.It is said that the function "find_all" is
same as "findAll", but I've tried both of them. But it is different.
Can anyone tell me the different?
import urllib, urllib2, cookielib
from BeautifulSoup import *
site = "http://share.dmhy.org/topics/list?keyword=TARI+TARI+team_id%3A407"
rqstr = urllib2.Request(site)
rq = urllib2.urlopen(rqstr)
fchData = rq.read()
soup = BeautifulSoup(fchData)
t = soup.findAll('tr')
print t
In BeautifulSoup version 4, the methods are exactly the same; the mixed-case versions (
findAll
,findAllNext
,nextSibling
, etc.) have all been renamed to conform to the Python style guide, but the old names are still available to make porting easier. See Method Names for a full list.In new code, you should use the lowercase versions, so
find_all
, etc.In your example however, you are using BeautifulSoup version 3 (discontinued since March 2012, don't use it if you can help it), where only
findAll()
is available. Unknown attribute names (such as.find_all
, which only is available in BeautifulSoup 4) are treated as if you are searching for a tag by that name. There is no<find_all>
tag in your document, soNone
is returned for that.from the source code of BeautifulSoup:
http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/element.py#L1260