So I wanted to get all of the pictures on this page(of the nba teams). http://www.cbssports.com/nba/draft/mock-draft
However, my code gives a bit more than that. It gives me,
<a href="/nba/teams/page/ORL"><img src="http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png" alt="Orlando Magic" width="30" height="30" border="0" /></a>
How can I shorten it to only give me, http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png.
My code:
import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read())
rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]
for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[1].find("a")
if anchor:
print anchor
To save all the images on http://www.cbssports.com/nba/draft/mock-draft,
To save any particular image on http://www.cbssports.com/nba/draft/mock-draft, use
I know this can be "traumatic", but for those automatically generated pages, where you just want to grab the damn images away and never come back, a quick-n-dirty regular expression that takes the desired pattern tends to be my choice (no Beautiful Soup dependency is a great advantage):
Hope this helps!
You can use this functions for getting the list of all images url from url.