I have multiple spiders within one scraping program, I am trying to run all spiders simultaneously out of a script and then dump the contents to a JSONfile. When I use the shell on each individual spider and do -o xyz.json it works fine.
I've attempted to follow this fairly thorough answer here: How to create custom Scrapy Item Exporter?
but when I run the file I can see it gather the data in the shell but it does not output it at all.
Below I've copied in order: Exporter, Pipeline, Settings,
Exporter:
from scrapy.exporters import JsonItemExporter
class XYZExport(JsonItemExporter):
def __init__(self, file, **kwargs):
super().__init__(file)
def start_exporting(self):
self.file.write(b)
def finish_exporting(self):
self.file.write(b)
I'm struggling to determine what goes in the self.file.write parentheses?
Pipeline:
from exporters import XYZExport
class XYZExport(object):
def __init__(self, file_name):
self.file_name = file_name
self.file_handle = None
@classmethod
def from_crawler(cls, crawler):
output_file_name = crawler.settings.get('FILE_NAME')
return cls(output_file_name)
def open_spider(self, spider):
print('Custom export opened')
file = open(self.file_name, 'wb')
self.file_handle = file
self.exporter = XYZExport(file)
self.exporter.start_exporting()
def close_spider(self, spider):
print('Custom Exporter closed')
self.exporter.finish_exporting()
self.file_handle.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
Settings:
FILE_NAME = 'C:\Apps Ive Built\WebScrape Python\XYZ\ScrapeOutput.json'
ITEM_PIPELINES = {
'XYZ.pipelines.XYZExport' : 600,
}
I hope/am afraid its a simple omission because that seems to be my MO, but I'm very new to scraping and this is the first time I've tried to do it this way.
If there is a more stable way to export this data I'm all ears, otherwise can you tell me what I've missed, that is preventing the data from being exported? or preventing the exporter from being properly called.
[Edited to change the pipeline name in settings]