i was trying to check the validity of a string as xml using this simplexml_load_string()
Docs function but it displays a lot of warning messages.
How can I check whether a string is a valid XML without suppressing (@
at the beginning) the error and displaying a warning function that expec
Use libxml_use_internal_errors() to suppress all XML errors, and libxml_get_errors() to iterate over them afterwards.
Simple XML loading string
libxml_use_internal_errors(true);
$doc = simplexml_load_string($xmlstr);
$xml = explode("\n", $xmlstr);
if (!$doc) {
$errors = libxml_get_errors();
foreach ($errors as $error) {
echo display_xml_error($error, $xml);
}
libxml_clear_errors();
}
From the documentation:
Dealing with XML errors when loading documents is a very simple task. Using the libxml
functionality it is possible to suppress all XML errors when loading the document and then iterate over the errors.
The libXMLError
object, returned by libxml_get_errors()
, contains several properties including the message
, line
and column
(position) of the error.
libxml_use_internal_errors(true);
$sxe = simplexml_load_string("<?xml version='1.0'><broken><xml></broken>");
if (!$sxe) {
echo "Failed loading XML\n";
foreach(libxml_get_errors() as $error) {
echo "\t", $error->message;
}
}
Reference: libxml_use_internal_errors
My version like this:
//validate only XML. HTML will be ignored.
function isValidXml($content)
{
$content = trim($content);
if (empty($content)) {
return false;
}
//html go to hell!
if (stripos($content, '<!DOCTYPE html>') !== false) {
return false;
}
libxml_use_internal_errors(true);
simplexml_load_string($content);
$errors = libxml_get_errors();
libxml_clear_errors();
return empty($errors);
}
Tests:
//false
var_dump(isValidXml('<!DOCTYPE html><html><body></body></html>'));
//true
var_dump(isValidXml('<?xml version="1.0" standalone="yes"?><root></root>'));
//false
var_dump(isValidXml(null));
//false
var_dump(isValidXml(1));
//false
var_dump(isValidXml(false));
//false
var_dump(isValidXml('asdasds'));
try this one
//check if xml is valid document
public function _isValidXML($xml) {
$doc = @simplexml_load_string($xml);
if ($doc) {
return true; //this is valid
} else {
return false; //this is not valid
}
}
Here a small piece of class I wrote a while ago:
/**
* Class XmlParser
* @author Francesco Casula <fra.casula@gmail.com>
*/
class XmlParser
{
/**
* @param string $xmlFilename Path to the XML file
* @param string $version 1.0
* @param string $encoding utf-8
* @return bool
*/
public function isXMLFileValid($xmlFilename, $version = '1.0', $encoding = 'utf-8')
{
$xmlContent = file_get_contents($xmlFilename);
return $this->isXMLContentValid($xmlContent, $version, $encoding);
}
/**
* @param string $xmlContent A well-formed XML string
* @param string $version 1.0
* @param string $encoding utf-8
* @return bool
*/
public function isXMLContentValid($xmlContent, $version = '1.0', $encoding = 'utf-8')
{
if (trim($xmlContent) == '') {
return false;
}
libxml_use_internal_errors(true);
$doc = new DOMDocument($version, $encoding);
$doc->loadXML($xmlContent);
$errors = libxml_get_errors();
libxml_clear_errors();
return empty($errors);
}
}
It works fine with streams and vfsStream as well for testing purposes.
Case
Occasionally check availability of a Google Merchant XML feed.
The feed is without DTD, so validate()
won't work.
Solution
// disable forwarding those load() errors to PHP
libxml_use_internal_errors(true);
// initiate the DOMDocument and attempt to load the XML file
$dom = new \DOMDocument;
$dom->load($path_to_xml_file);
// check if the file contents are what we're expecting them to be
// `item` here is for Google Merchant, replace with what you expect
$success = $dom->getElementsByTagName('item')->length > 0;
// alternatively, just check if the file was loaded successfully
$success = null !== $dom->actualEncoding;
length
above contains a number of how many products are actually listed in the file. You can use your tag names instead.
Logic
You can call getElementsByTagName()
on any other tag names (item
I used is for Google Merchant, your case may vary), or read other properties on the $dom
object itself. The logic stays the same: instead of checking if there were errors when loading the file, I believe actually trying to manipulate it (or specifically check if it contains the values you actually need) would be more reliable.
Most important: unlike validate()
, this won't require your XML to have a DTD.