I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrap web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider starts to crawl exactly after another spiders finishes crawling. How can that be achieved? Is scrapyd the solution?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
scrapyd is indeed a good way to go, max_proc or max_proc_per_cpu configuration can be used to restrict the number of parallel spdiers, you will then schedule spiders using scrapyd rest api like: