I am making use of simplehtmldom which has this funciton:
// get html dom form file
function file_get_html() {
$dom = new simple_html_dom;
$args = func_get_args();
$dom->load(call_user_func_array('file_get_contents', $args), true);
return $dom;
}
I use it like so:
$html3 = file_get_html(urlencode(trim("$link")));
Sometimes, a URL may just not be valid and I want to handle this. I thought I could use a try and catch but this hasn't worked since it doesn't throw an exception, it just gives a php warning like this:
[06-Aug-2010 19:59:42] PHP Warning: file_get_contents(http://new.mysite.com/ghs 1/) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /home/example/public_html/other/simple_html_dom.php on line 39
Line 39 is in the above code.
How can i correctly handle this error, can I just use a plain if
condition, it doesn't look like it returns a boolean.
Thanks all for any help
Update
Is this a good solution?
if(fopen(urlencode(trim("$next_url")), 'r')){
$html3 = file_get_html(urlencode(trim("$next_url")));
}else{
//do other stuff, error_logging
return false;
}
Here's an idea:
function fget_contents() {
$args = func_get_args();
// the @ can be removed if you lower error_reporting level
$contents = @call_user_func_array('file_get_contents', $args);
if ($contents === false) {
throw new Exception('Failed to open ' . $file);
} else {
return $contents;
}
}
Basically a wrapper to file_get_contents
. It will throw an exception on failure.
To avoid having to override file_get_contents
itself, you can
// change this
$dom->load(call_user_func_array('file_get_contents', $args), true);
// to
$dom->load(call_user_func_array('fget_contents', $args), true);
Now you can:
try {
$html3 = file_get_html(trim("$link"));
} catch (Exception $e) {
// handle error here
}
Error suppression (either by using @
or by lowering the error_reporting level is a valid solution. This can throw exceptions and you can use that to handle your errors. There are many reasons why file_get_contents
might generate warnings, and PHP's manual itself recommends lowering error_reporting: See manual
Use CURL to get the URL and handle the error response that way.
Simple example from curl_init():
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
From my POV, good error handling is one of the big challenges in PHP. Fortunately you can register your own Error Handler and decide for yourself what to do.
You can define a fairly simple error handler like this:
function throwExceptionOnError(int $errorCode , string $errorMessage) {
// Usually you would check if the error code is serious
// enough (like E_WARNING or E_ERROR) to throw an exception
throw new Exception($errorMessage);
}
and register it in your function like so:
function file_get_html() {
$dom = new simple_html_dom;
$args = func_get_args();
set_error_handler("throwExceptionOnError");
$dom->load(call_user_func_array('file_get_contents', $args), true);
restore_error_handler();
return $dom;
}
- For an exhaustive list of error codes see http://php.net/manual/errorfunc.constants.php
- For a complete documentation of set_error_handler, see
http://php.net/manual/en/function.set-error-handler.php
IF youre fetching from an external URL the best handling is going to come fromt he introduction of HTTP library like Zend_Http. This isnt much different than using CURL or fopen except its going to extract the particulars of these "dirvers" into a universal API and then you can choose which you want to use. Its also going to have some built in error trapping to make it easier on you.
If you dont want the overhead of another library then you can code it yourself obviously - in which case i always prefer CURL.