Scrapy Crawling Speed is Slow (60 pages / min)

2020-05-24 05:04发布

问题:

I am experiencing slow crawl speeds with scrapy (around 1 page / sec). I'm crawling a major website from aws servers so I don't think its a network issue. Cpu utilization is nowhere near 100 and if I start multiple scrapy processes crawl speed is much faster.

Scrapy seems to crawl a bunch of pages, then hangs for several seconds, and then repeats.

I've tried playing with: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500

but this doesn't really seem to move the needle past about 20.

回答1:

Are you sure you are allowed to crawl the destination site at high speed? Many sites implement download threshold and "after a while" start responding slowly.