SimpleXML XML Parsing [closed]

2019-07-27 17:46发布

问题:

I have created a script that take XML from URL and updates mysql database and parses data to csv file.

I get HTML strings in XML and they should not be there. How to remove them while parsing?

I load XML file like this:

$xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

Error that I get when running the script:

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : Space required after the Public Identifier in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : SystemLiteral " or ' expected in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: http://domain.com/api/get_catalog.php?id=351&user=878&key=b8:1: parser error : SYSTEM or PUBLIC, the URI is missing in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59

Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /dokumenti/skripte/xmlupdate/lost/test/lost_xml.php on line 59
xml $ not loaded.

When I use a Firefox and save XML from url to disk I have no problem parsing it just when I try to get it from url.

XML looks fine: Part of XML:

<?xml version="1.0" encoding="UTF-8"?>
<RecroKatalog>
<viewCustomerDiscount>
    <BrojArtikla>10214</BrojArtikla>
    <Naziv>Eksterno kucište 2.5&quot; S-ATA+IDE HDD, Aluminium, USB 2.0</Naziv>
    <NetoPrice>81.8224</NetoPrice>
    <Status>Dostupno</Status>
    <Opis></Opis>
    <dugi_opis>Isporucuje se u SIVOJ boji</dugi_opis>
    <Image>http://shop.lost.hr/data/images/big/10.jpg</Image>
    <WEB_Grupa>Ladice i eksterna kucišta - OSTALO</WEB_Grupa>
    <Akcija>0</Akcija>
    <Proizvodjac></Proizvodjac>
    <Klasifikacija>PH-25SD-B/VK220</Klasifikacija>
</viewCustomerDiscount>

回答1:

There are some HUGE clues in the error messages. It is complaining about seeing:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

It is the start of a HTML document being provided by that website… not the XML you're looking for.

This usually happens when you have to authenticate against the remote service (hence working in your browser, as you logged in), but you're not telling SimpleXML to do that for you.