Using Scrapy to parse sitemaps

2020-07-31 05:40发布

I want to be able to use scrapy to crawl links on a sitemap. I don't know much about this application, so I would be interested in any links/info/documentation you could provide.

Thanks

2条回答
放我归山
2楼-- · 2020-07-31 06:22

A new generic spider has just been added to Scrapy trunk, for this purpose. It will be available on next release (Scrapy 0.14)

查看更多
小情绪 Triste *
3楼-- · 2020-07-31 06:34

All of the documentation is at http://doc.scrapy.org/. The tutorials can be found at scrapy.org also.

As for your question, see this SO question: how to parse a sitemap.xml file using scrapy's XmlFeedSpider?

查看更多
登录 后发表回答