Problem with simpleXML and entity not being define

2019-01-28 19:17发布

I'm trying to parse a XML file, but when loading it simpleXML prints the following warning:

Warning: simplexml_load_file() [function.simplexml-load-file]: gpr_545.xml:55: parser error : Entity 'Oslash' not defined in import.php on line 35

This is that line:

<forenames>B&Oslash;IE</forenames><x> </x>

As it is a warning, I might ignore it, but I'd like to understand what is happening.

5条回答
【Aperson】
2楼-- · 2019-01-28 19:54

HTML Encoding of Latin1 characters (like Ø, what that character describes) is what has broken the XML parser. If you're in control of the data, you need to escape it using XML style character encoding (Ø just happens to be & #216;)

查看更多
Melony?
3楼-- · 2019-01-28 19:58

I think this is an encoding problem. php, simplexml in this particular case, does not like the danish O you've got in that fornames tag. You could try to encode the whole file in utf-8 and removing the escaped version from the tag by that. Aferwards you can read a fully escaped character free file into simplexml.

K

查看更多
放我归山
4楼-- · 2019-01-28 20:02

Just had a very similar problem and solved it in the following way. The main idea was to load a file into a string, replace all bad entities on something like "[[entity]]Oslash;" and carry out reverse replacement before displaying some xml node.

function readXML($filename){
    $xml_string = implode("", file($filename));
    $xml_string = str_replace("&", "[[entity]]", $xml_string);
    return simplexml_load_string($xml_string);
}
function xml2str($xml){
    $str = str_replace("[[entity]]", "&", (string)$xml);
    $str = iconv("UTF-8", "WINDOWS-1251", $str);
    return $str;
}
$xml = readXML($filename);
echo xml2str($xml->forenames);

iconv("UTF-8", "WINDOWS-1251", $str) as I have "WINDOWS-1251" encoding on my page

查看更多
forever°为你锁心
5楼-- · 2019-01-28 20:05

Try to use this line:

<forenames><![CDATA[B&Oslash;IE]]></forenames><x> </x>

and read this about CDATA

查看更多
够拽才男人
6楼-- · 2019-01-28 20:12

HTML-entities like &Oslash is not the same as XML-entities. Here's a table for replacing HTML-entities to XML-entities.

As I can tell from one of your comments to another post, you're having trouble with an entity /. I don't know if this even is a valid HTML-entity, my Firefox won't show the character - only ouputs the entity name. But I found an other table for most entities and their character reference number. Try adding them to your replace-table and you should be safe. /'s reference number is / by the way.

查看更多
登录 后发表回答