Anyone know of a good Python based web crawler tha

2019-01-03 11:03发布

I'm half-tempted to write my own, but I don't really have enough time right now. I've seen the Wikipedia list of open source crawlers but I'd prefer something written in Python. I realize that I could probably just use one of the tools on the Wikipedia page and wrap it in Python. I might end up doing that - if anyone has any advice about any of those tools, I'm open to hearing about them. I've used Heritrix via its web interface and I found it to be quite cumbersome. I definitely won't be using a browser API for my upcoming project.

Thanks in advance. Also, this is my first SO question!

8条回答
何必那么认真
2楼-- · 2019-01-03 11:49

I've used Ruya and found it pretty good.

查看更多
够拽才男人
3楼-- · 2019-01-03 12:06

Another simple spider Uses BeautifulSoup and urllib2. Nothing too sophisticated, just reads all a href's builds a list and goes though it.

查看更多
登录 后发表回答