Simple HTML DOM returning false

2019-08-14 14:31发布

I've encountered something strange when using Simple HTML DOM to parse a webpage with a certain query string. Some query strings work when trying to parse this used car page of a dealership's website, however others do not. It seems to be that whenever there are more vehicles to be shown on the page, it will not return the HTML content (meaning if we are on the last page of pagination it will work, otherwise it won't). Just wondering if anyone has any ideas. I've tried viewing the page with javascript disabled to see if the markup is different, but it seems like the page behaves similarly. Below is code if anyone has any ideas... Or better yet solutions. Thanks all!

require ('simple_html_dom.php');
error_reporting(E_ALL);
$startingURL = 'http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=2';
$getHTML = file_get_html($startingURL);
if ($getHTML == true){
    echo '<h1>TRUE</h1>';
    var_dump($getHTML);
}
else {
    echo '<h1>FALSE</h1>';
    var_dump($getHTML);
}

When using var_dump with the above URL it returns a boolean false. When using the following URL, I can parse the data no issue - http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=5

Thanks.

标签： php html parsing simple-html-dom

1条回答

聊天终结者

2楼-- · 2019-08-14 15:07

you should not use the default function file_get_html for getting remote content, that function use file_get_content to download page content. Sometime the target website will block your request by the user agent or referer. You could try PHP Curl to download page content first, then parse it with simple_html_dom

0人赞添加讨论(0) 举报

Simple HTML DOM returning false

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间