I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!
Code:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);
This:
@$xmlDoc->loadHTML($fetchResult)
can suppress the warnings but how can I capture those warnings programatically?
To hide the warnings, you have to give special instructions to
libxml
which is used internally to perform the parsing:The
libxml_use_internal_errors(true)
indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.This is not the same as the
@
operator. The warnings get collected behind the scenes and afterwards you can retrieve them by usinglibxml_get_errors()
in case you wish to perform logging or return the list of issues to the caller.Whether or not you're using the collected warnings you should always clear the queue by calling
libxml_clear_errors()
.Preserving the state
If you have other code that uses
libxml
it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value oflibxml_use_internal_errors()
to save the previous state.You can install a temporary error handler with
set_error_handler
Usage:
Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:
Call
prior to processing with with
$xmlDoc->loadHTML()
This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready.