I'm using a basic spider that gets particular information from links on a website. My code looks like this:
import sys
from scrapy import Request
import urllib.parse as urlparse
from properties import PropertiesItem, ItemLoader
from scrapy.crawler import CrawlerProcess
class BasicSpider(scrapy.Spider):
name = "basic"
allowed_domains = ["web"]
start_urls = ['www.example.com']
objectList = []
def parse(self, response):
# Get item URLs and yield Requests
item_selector = response.xpath('//*[@class="example"]//@href')
for url in item_selector.extract():
yield Request(urlparse.urljoin(response.url, url), callback=self.parse_item, dont_filter=True)
def parse_item(self, response):
L = ItemLoader(item=PropertiesItem(), response=response)
L.add_xpath('title', '//*[@class="example"]/text()')
L.add_xpath('adress', '//*[@class="example"]/text()')
return L.load_item()
process = CrawlerProcess()
process.crawl(BasicSpider)
process.start()
What I want now is to append every class instance "L" to a list called objectList. I've tried do to so by altering the code like:
def parse_item(self, response):
global objectList
l = ItemLoader(item=PropertiesItem(), response=response)
l.add_xpath('title', '//*[@class="restaurantSummary-name"]/text()')
l.add_xpath('adress', '//*[@class="restaurantSummary-address"]/text()')
item = l.load_item()
objectList.append([item.title, item.adress])
return objectList
But when I run this code I get a message saying:
l = ItemLoader(item=PropertiesItem(), response=response)
NameError: name 'PropertiesItem' is not defined
Q: How do I append every item that the scraper finds to the list objectList?
EDIT:
I want to store the results in a list, because I can then save the results like this:
import pandas as pd
table = pd.DataFrame(objectList)
writer = pd.ExcelWriter('DataAll.xlsx')
table.to_excel(writer, 'sheet 1')
writer.save()
To save results you should use scrapy's Feed Exporters feature as described in the documentation here
See the csv section for your case.
Another, more custom, approach would be using scrapy's Item Pipelines. There's an example of simple json writer here that could be easily modified to output csv or any other format.
For example this piece of code would output all items to an
test.csv
file in project directory:This example generates 50 row long csv file.