I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('//td[@class="title"]')
subtext = hxs.select('//td[@class="subtext"]')
items = []
for title in titles:
item = HackernewsItem()
item["title"] = title.select("a/text()").extract()
item["url"] = title.select("a/@href").extract()
items.append(item)
for score in subtext:
item = HackernewsItem()
item["score"] = score.select("span/text()").extract()
items.append(item)
return items
As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.
CSV image attached:
and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv
The CSV module from Python 2.7 does not support Unicode, so it's suggested to use unicodecsv instead.
And then use this instead of
import csv
Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:
I didn't test it, but it will give you an idea.