Beautiful Soup to parse url to get another urls da

I need to parse a url to get a list of urls that link to a detail page. Then from that page I need to get all the details from that page. I need to do it this way because the detail page url is not regularly incremented and changes, but the event list page stays the same.

Basically:

example.com/events/
    <a href="http://example.com/events/1">Event 1</a>
    <a href="http://example.com/events/2">Event 2</a>

example.com/events/1
    ...some detail stuff I need

example.com/events/2
    ...some detail stuff I need

标签： python html parsing beautifulsoup

3条回答

够拽才男人

2楼-- · 2019-01-17 07:29

Use urllib2 to get the page, then use beautiful soup to get the list of links, also try scraperwiki.com

Edit:

Recent discovery: Using BeautifulSoup through lxml with

from lxml.html.soupparser import fromstring

is miles better than just BeautifulSoup. It lets you do dom.cssselect('your selector') which is a life saver. Just make sure you have a good version of BeautifulSoup installed. 3.2.1 works a treat.

dom = fromstring('<html... ...')
navigation_links = [a.get('href') for a in htm.cssselect('#navigation a')]

0人赞添加讨论(0) 举报

爷、活的狠高调

3楼-- · 2019-01-17 07:35

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen('http://yahoo.com').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', href=True):
    print anchor['href']

It will give you the list of urls. Now You can iterate over those urls and parse the data.

inner_div = soup.findAll("div", {"id": "y-shade"}) This is an example. You can go through the BeautifulSoup tutorials.

0人赞添加讨论(0) 举报

一夜七次

4楼-- · 2019-01-17 07:41

For the next group of people that come across this, BeautifulSoup has been upgraded to v4 as of this post as v3 is no longer being updated..

$ easy_install beautifulsoup4

$ pip install beautifulsoup4

To use in Python...

import bs4 as BeautifulSoup

0人赞添加讨论(0) 举报

Beautiful Soup to parse url to get another urls da

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间