Using Scrapy JsonItemsLinesExporter, returns no va

2019-08-06 16:03发布

I have multiple spiders within one scraping program, I am trying to run all spiders simultaneously out of a script and then dump the contents to a JSONfile. When I use the shell on each individual spider and do -o xyz.json it works fine.

I've attempted to follow this fairly thorough answer here: How to create custom Scrapy Item Exporter?

but when I run the file I can see it gather the data in the shell but it does not output it at all.

Below I've copied in order: Exporter, Pipeline, Settings,

Exporter:

from scrapy.exporters import JsonItemExporter

class XYZExport(JsonItemExporter):

    def __init__(self, file, **kwargs):
        super().__init__(file)

    def start_exporting(self):
        self.file.write(b)

    def finish_exporting(self):
        self.file.write(b)

I'm struggling to determine what goes in the self.file.write parentheses?

Pipeline:

from exporters import XYZExport

class XYZExport(object):
    def __init__(self, file_name):
        self.file_name = file_name
        self.file_handle = None

    @classmethod
    def from_crawler(cls, crawler):
        output_file_name = crawler.settings.get('FILE_NAME')

        return cls(output_file_name)

    def open_spider(self, spider):
        print('Custom export opened')

        file = open(self.file_name, 'wb')
        self.file_handle = file


        self.exporter = XYZExport(file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        print('Custom Exporter closed')

        self.exporter.finish_exporting()

        self.file_handle.close()

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

Settings:

FILE_NAME = 'C:\Apps Ive Built\WebScrape Python\XYZ\ScrapeOutput.json'
ITEM_PIPELINES = {
      'XYZ.pipelines.XYZExport' : 600,
}

I hope/am afraid its a simple omission because that seems to be my MO, but I'm very new to scraping and this is the first time I've tried to do it this way.

If there is a more stable way to export this data I'm all ears, otherwise can you tell me what I've missed, that is preventing the data from being exported? or preventing the exporter from being properly called.

[Edited to change the pipeline name in settings]

0条回答
登录 后发表回答