simple html dom scraping large html file

2019-02-13 17:17发布

问题:

I need to scrape a large html file (eg: http://www.indianrail.gov.in/mail_express_trn_list.html) using simple html dom. I started with a simple script:

<?php
require "simple_html_dom.php";
echo file_get_html('http://www.indianrail.gov.in/mail_express_trn_list.html')->plaintext;
?>

which shows nothing, just a blank page with the error message in Apache error.log file

 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3
 PHP Notice:  Trying to get property of non-object in /var/www/index.php on line 3

at the same time all other pages (eg: http://www.indianrail.gov.in/special_trn_list.html) works fine with the same script.

回答1:

The issue appears to be MAX_FILE_SIZE defined in simple_html_dom.

you can adjust it by editing define('MAX_FILE_SIZE', 600000); line in simple_html_dom.php file.