How to scrape a javascript site using PHP, CURL [d

2019-03-06 22:44发布

问题:

Possible Duplicate:
How do I render javascript from another site, inside a PHP application?

This is the site http://www.oferta.pl/strona_v2/gazeta_v2/ . This site is built totally on JavaScript. I want to scrape using PHP and curl. Currently I use DOMXPath. In the left menu there are some category to be selected. I see no 'form' there. How can I use curl to submit that form and scrap the output page?

I have used file_get_contents() only. It doesn't get all of the page. How can I proceed?

N.B : http://www.html-form-guide.com/php-form/php-form-submit.html I have found this example which have a 'form'. But my specified site has no 'form'.

回答1:

You can not scrape it. Its possible. But its way too hard.

  1. Simulate the http request by curl. Check every request it makes by ajax and try to simulate it.

  2. Simulate Javascript executions (this part is almost impossible). Some requests contains values which are generated by Javascript. You need to do it in php. If they has some complicated algorithm implemented in JS you can invoke v8 javascript engine.