Python: Scrapy CSV exports incorrectly?

2019-07-24 14:46发布

I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.

CSV image attached:csv file

and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv

2条回答
三岁会撩人
2楼-- · 2019-07-24 15:00

The CSV module from Python 2.7 does not support Unicode, so it's suggested to use unicodecsv instead.

$pip install unicodecsv

The unicodecsv is a drop-in replacement for Python 2's csv module which supports unicode strings without a hassle.

And then use this instead of import csv

import unicodecsv as csv
查看更多
倾城 Initia
3楼-- · 2019-07-24 15:15

Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

I didn't test it, but it will give you an idea.

查看更多
登录 后发表回答