Scrapy Python Craigslist Scraper

2020-07-17 04:50发布

I am trying to scrape Craigslist classifieds using Scrapy to extract items that are for sale.

I am able to extract date, post title, and post url but am having trouble extracting price.

For some reason the current code extracts all of the prices, but when I remove the // before the price span look up the price field returns as empty.

Can someone please review the code below and help me out?

from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    from craigslist_sample.items import CraigslistSampleItem

    class MySpider(BaseSpider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = ["http://longisland.craigslist.org/search/sss?sort=date&query=raptor%20660&srchType=T"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p")
    items = []
    for titles in titles:
        item = CraigslistSampleItem()
        item['date'] = titles.select('span[@class="itemdate"]/text()').extract()
        item ["title"] = titles.select("a/text()").extract()
        item ["link"] = titles.select("a/@href").extract()
        item ['price'] = titles.select('//span[@class="itempp"]/text()').extract()
        items.append(item)
    return items

1条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-07-17 05:19

itempp appears to be inside of another element, itempnr. Perhaps it would work if you were to change //span[@class="itempp"]/text() to span[@class="itempnr"]/span[@class="itempp"]/text().

查看更多
登录 后发表回答