Is there any Python module that helps to crawl dat

2019-05-27 05:06发布

问题:

I want to scrape data from a page which loads DOM elements using Ajax call.

I have tried with the old solution line PyQt4-based scraping, which loads the DOM after it's fully loaded, but the problem is that I need to do a POST request and it's only available for GET.

The new Python module ghost.py has time out issues: when it fetches a large DOM tree it raises a time out exception.

If anyone knows any specific way or tools that can help me to do a POST request and grab the data after fully loaded DOM, that will help me a lot.

回答1:

You can use Selenium to automate browser and access dom. Selenium has python driver hence you can write code in python to navigate to the page. click buttons and wait for ajax call to complete before you start scrapping.



回答2:

For emulating Javascript and automate browser, I recommend `Spynner. You can run it with or without a Xserver and the syntax is quite simple to use. You can load jquery too.