scrapy: access spider class variable in pipeline _

2019-07-26 05:08发布

I know you can access spider variables in process_item(), but how can I access spider variables in pipeline init function?

class SiteSpider(CrawlSpider):
   def __init__(self):
        self.id = 10

class MyPipeline(object):
     def __init__(self):
        ...

I also need to access CUSTOM_SETTINGS_VARIABLE in MyPipeline.

标签： python-2.7 scrapy

1条回答

Luminary・发光体

2楼-- · 2019-07-26 05:39

You can't access the spider instance as the pipeline initialization is done when the engine starts. In fact, you have to think that your pipeline handles multiple spiders and not just one spider.

Having said that, you can hook the spider_opened signal to access the spider instance when it starts.

from scrapy import signals


class MyPipeline(object):

    def __init__(self, mysetting):
        # do stuff with the arguments...
        self.mysetting = mysetting

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
        crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
        return instance

    def spider_opened(self, spider):
        # do stuff with the spider: initialize resources, etc.
        spider.log("[MyPipeline] Initializing resources for %s" % spider.name)

    def process_item(self, item, spider):
        return item

0人赞添加讨论(0) 举报

scrapy: access spider class variable in pipeline _

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间