Is there a web crawler library available for PHP o

2019-04-14 00:58发布

Is there a web crawler library available for PHP or Ruby? a library that can do it depth first or breadth first... and handle the links even when href="../relative_path.html" and base url is used.

5条回答
女痞
2楼-- · 2019-04-14 01:22

If you need to scrape web pages that use javascript you can use Capybara with a driver which will spin up a real browser, such as poltergeist. Its usually used with a testing framework for acceptance testing, but can also be used outside a testing framework.

查看更多
淡お忘
3楼-- · 2019-04-14 01:27

If you'd like to learn basic web crawler & search things, you can start look at "luna engine".

查看更多
Anthone
5楼-- · 2019-04-14 01:33

Check this page out for a Ruby library: Ruby Mechanize

I'd like to mention that you would still be responsible for the way in which your crawler traverses sites.

查看更多
干净又极端
6楼-- · 2019-04-14 01:44

you can go for webrat or watir in ruby, much easier than mechanize

查看更多
登录 后发表回答