scrapy: access spider class variable in pipeline _

2019-07-26 05:08发布

I know you can access spider variables in process_item(), but how can I access spider variables in pipeline init function?

class SiteSpider(CrawlSpider):
   def __init__(self):
        self.id = 10

class MyPipeline(object):
     def __init__(self):
        ...

I also need to access CUSTOM_SETTINGS_VARIABLE in MyPipeline.

1条回答
Luminary・发光体
2楼-- · 2019-07-26 05:39

You can't access the spider instance as the pipeline initialization is done when the engine starts. In fact, you have to think that your pipeline handles multiple spiders and not just one spider.

Having said that, you can hook the spider_opened signal to access the spider instance when it starts.

from scrapy import signals


class MyPipeline(object):

    def __init__(self, mysetting):
        # do stuff with the arguments...
        self.mysetting = mysetting

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
        crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
        return instance

    def spider_opened(self, spider):
        # do stuff with the spider: initialize resources, etc.
        spider.log("[MyPipeline] Initializing resources for %s" % spider.name)

    def process_item(self, item, spider):
        return item
查看更多
登录 后发表回答