Reading in Malformed XML (unencoded XML entities)

2020-02-12 23:18发布

I'm having some trouble parsing malformed XML in PHP. In particular I'm querying a third party webservice that returns data in an XML format without encoding the XML entities in actual data. For example one of the the elements contains an ASCII heart, '<3', without the quotes, which the XML parser sees as an opening tag. It should be '&lt;3'.

Right now I'm simply passing the XML string into a SimpleXMLElement which, predictably, fails on these instances. I've done some looking around and it seems like PHP Tidy package might be able to help me, but the amount of configuration you can do is overwhelming :(

Thus, I'm just wondering if anyone else has had a problem like this and, if so, how they were able to solve it.

Thanks!

2条回答
迷人小祖宗
2楼-- · 2020-02-12 23:40
  1. Read the content as a string.
  2. htmlspecialchars(preg_replace('/[\x-\x8\xb-\xc\xe-\x1f]/','',$string))
  3. Load the transformed string in SimpleXMLElement

It worked for me so far.

查看更多
对你真心纯属浪费
3楼-- · 2020-02-12 23:50

Try tidy.repairString:

php > $tidy = new tidy();
php > $repaired = $tidy->repairString("<foo>I <3 Philadelphia</foo>", array("input-xml"=>1));
php > print($repaired);
<foo>I &lt;3 Philadelphia</foo>
php > $el = new SimpleXMLElement($repaired);
查看更多
登录 后发表回答