Scrapy Crawling Speed is Slow (60 pages / min)

2020-05-24 05:14发布

I am experiencing slow crawl speeds with scrapy (around 1 page / sec). I'm crawling a major website from aws servers so I don't think its a network issue. Cpu utilization is nowhere near 100 and if I start multiple scrapy processes crawl speed is much faster.

Scrapy seems to crawl a bunch of pages, then hangs for several seconds, and then repeats.

I've tried playing with: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500

but this doesn't really seem to move the needle past about 20.

标签： python http scrapy web-crawler

1条回答

Evening l夕情丶

2楼-- · 2020-05-24 05:28

Are you sure you are allowed to crawl the destination site at high speed? Many sites implement download threshold and "after a while" start responding slowly.

0人赞添加讨论(0) 举报

Scrapy Crawling Speed is Slow (60 pages / min)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间