Python: Scrapy CSV exports incorrectly?

I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.

CSV image attached: csv file

and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv

标签： python csv export scrapy

2条回答

三岁会撩人

2楼-- · 2019-07-24 15:00

The CSV module from Python 2.7 does not support Unicode, so it's suggested to use unicodecsv instead.

$pip install unicodecsv

The unicodecsv is a drop-in replacement for Python 2's csv module which supports unicode strings without a hassle.

And then use this instead of import csv

import unicodecsv as csv

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-07-24 15:15

Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

I didn't test it, but it will give you an idea.

0人赞添加讨论(0) 举报

Python: Scrapy CSV exports incorrectly?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间