I'm trying to parse a XML file, but when loading it simpleXML prints the following warning:
Warning: simplexml_load_file() [function.simplexml-load-file]: gpr_545.xml:55: parser error : Entity 'Oslash' not defined in import.php on line 35
This is that line:
<forenames>BØIE</forenames><x> </x>
As it is a warning, I might ignore it, but I'd like to understand what is happening.
HTML Encoding of Latin1 characters (like Ø, what that character describes) is what has broken the XML parser. If you're in control of the data, you need to escape it using XML style character encoding (Ø just happens to be & #216;)
HTML-entities like Ø is not the same as XML-entities. Here's a table for replacing HTML-entities to XML-entities.
As I can tell from one of your comments to another post, you're having trouble with an entity /. I don't know if this even is a valid HTML-entity, my Firefox won't show the character - only ouputs the entity name. But I found an other table for most entities and their character reference number. Try adding them to your replace-table and you should be safe. /'s reference number is / by the way.
I think this is an encoding problem. php, simplexml in this particular case, does not like the danish O you've got in that fornames tag. You could try to encode the whole file in utf-8 and removing the escaped version from the tag by that. Aferwards you can read a fully escaped character free file into simplexml.
K
Just had a very similar problem and solved it in the following way. The main idea was to load a file into a string, replace all bad entities on something like "[[entity]]Oslash;" and carry out reverse replacement before displaying some xml node.
function readXML($filename){
$xml_string = implode("", file($filename));
$xml_string = str_replace("&", "[[entity]]", $xml_string);
return simplexml_load_string($xml_string);
}
function xml2str($xml){
$str = str_replace("[[entity]]", "&", (string)$xml);
$str = iconv("UTF-8", "WINDOWS-1251", $str);
return $str;
}
$xml = readXML($filename);
echo xml2str($xml->forenames);
iconv("UTF-8", "WINDOWS-1251", $str) as I have "WINDOWS-1251" encoding on my page
Try to use this line:
<forenames><![CDATA[BØIE]]></forenames><x> </x>
and read this about CDATA