How to get the number of requests in queue in scra

I am using scrapy to crawl some websites. How to get the number of requests in the queue?

I have looked at the scrapy source code and find scrapy.core.scheduler.Scheduler may lead to my answer. See: https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py

Two questions:

How to access the scheduler in my spider class?
What does the self.dqs and self.mqs mean in the scheduler class?

标签： python scrapy

2条回答

Evening l夕情丶

2楼-- · 2019-03-18 15:02

This took me a while to figure out, but here's what I used:

self.crawler.engine.slot.scheduler

That is the instance of the scheduler. You can then call the __len__() method of it, or if you just need true/false for pending requests, do something like this:

self.crawler.engine.scheduler_cls.has_pending_requests(self.crawler.engine.slot.scheduler)

Beware that there could still be running requests even thought the queue is empty. To check how many requests are currently running use:

len(self.crawler.engine.slot.inprogress)

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-03-18 15:04

An approach to answer your questions:

From the documentation http://readthedocs.org/docs/scrapy/en/0.14/faq.html#does-scrapy-crawl-in-breath-first-or-depth-first-order

By default, Scrapy uses a LIFO queue for storing pending requests, which basically means that it crawls in DFO order. This order is more convenient in most cases. If you do want to crawl in true BFO order, you can do it by setting the following settings:

DEPTH_PRIORITY = 1
SCHEDULER_DISK_QUEUE = 'scrapy.squeue.PickleFifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeue.FifoMemoryQueue'

So self.dqs and self.mqs are auto esplicative (disk queque scheduler and memory queue scheduler.

From another SO answer there is a suggestion about accessing to the (Storing scrapy queue in a database) scrapy internale queque rappresentation queuelib https://github.com/scrapy/queuelib

Once you get it you just need to count the length of the queue.

0人赞添加讨论(0) 举报

How to get the number of requests in queue in scra

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间