Scrapy Python Craigslist Scraper

2020-07-17 04:50发布

I am trying to scrape Craigslist classifieds using Scrapy to extract items that are for sale.

I am able to extract date, post title, and post url but am having trouble extracting price.

For some reason the current code extracts all of the prices, but when I remove the // before the price span look up the price field returns as empty.

Can someone please review the code below and help me out?

from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    from craigslist_sample.items import CraigslistSampleItem

    class MySpider(BaseSpider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = ["http://longisland.craigslist.org/search/sss?sort=date&query=raptor%20660&srchType=T"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p")
    items = []
    for titles in titles:
        item = CraigslistSampleItem()
        item['date'] = titles.select('span[@class="itemdate"]/text()').extract()
        item ["title"] = titles.select("a/text()").extract()
        item ["link"] = titles.select("a/@href").extract()
        item ['price'] = titles.select('//span[@class="itempp"]/text()').extract()
        items.append(item)
    return items

标签： python scrapy scraper Craigslist

1条回答

爱情/是我丢掉的垃圾

2楼-- · 2020-07-17 05:19

itempp appears to be inside of another element, itempnr. Perhaps it would work if you were to change //span[@class="itempp"]/text() to span[@class="itempnr"]/span[@class="itempp"]/text().

0人赞添加讨论(0) 举报

Scrapy Python Craigslist Scraper

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间