What is the relationship between the crawler objec

I'm working with scrapy. I have a pipieline that starts with:

class DynamicSQLlitePipeline(object):

    @classmethod
    def from_crawler(cls, crawler):
        # Here, you get whatever value was passed through the "table" parameter
        table = getattr(crawler.spider, "table")
        return cls(table)

    def __init__(self,table):
        try:
            db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
            db = dataset.connect(db_path)
            table_name = table[0:3]  # FIRST 3 LETTERS
            self.my_table = db[table_name]

I've been reading through https://doc.scrapy.org/en/latest/topics/api.html#crawler-api , which contains:

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

but still do not understand the from_crawler method, and the crawler object. What is the relationship between the crawler object with spider and pipeline objects? How and when is a crawler instantiated? Is a spider a subclass of crawler? I've asked Passing scrapy instance (not class) attribute to pipeline, but I don't understand how the pieces fit together.

标签： python scrapy

1条回答

Deceive 欺骗

2楼-- · 2019-04-15 00:35

Crawler is actually one of the most important objects in the Scrapy's architecture. It is a central piece of the crawling execution logic which "glues" a lot of other pieces together:

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

A crawler or multiple crawlers are controlled by the CrawlerRunner or the CrawlerProcess instance.

Now that from_crawler method which is available on lots of Scrapy components is just a way for these components to get access to the crawler instance that is running this particular component.

Also, look at the Crawler, CrawlerRunner and CrawlerProcess actual implementations.

And, what I personally found helpful in order to better understand how Scrapy works internally was to run a spider from a script - check out these detailed step-by-step instructions.

0人赞添加讨论(0) 举报

What is the relationship between the crawler objec

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间