web crawling tools which support interacting with

2019-09-03 10:10发布

问题:

I am looking for a crawler which is capable of handling pages with Ajax and being able to perform certain user interactions with the target site before starting to crawl the site (e.g., clicking on certain menu items, filling some forms, etc...).I tried webdriver/selenium (which are really web scraping tools) and now I am want to know if there is any crawler available that supports emulating certain user interactions before starting to crawl ? (In Java or Python or Ruby ...)

Thanks

ps - Can nutch do this ? If yes, I appreciate any link describing this.

回答1:

Nutch does not handle AJAX, cookies or any of the user interactions that you described.



回答2:

You could try hooking up selenium to a python based crawler like scrapy . Whenever AJAX needs to be handled, it'll fire up an external process for scraping with selenium.