Parsing XML using PHP

2019-01-20 14:28发布


I've consistently had an issue with parsing XML with PHP and not really found "the right way" or at least a standardised way of parsing XML files.

Firstly i'm trying to parse this:

     <description><![CDATA[ ><img width="126" alt="" src="" /> ]]></description> 
     <pubDate>Tue, 21 Apr 2009 16:12:31 +0000</pubDate> 
     <media:content url="" fileSize="13065" type="image/jpeg" expression="full"  width="126" height="126" /> 
     <media:thumbnail url="" type="image/jpeg" width="126" height="126" /> 

I'm using this code:

$doc = new DOMDocument();
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
    $itemRSS = array ( 
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue
    array_push($arrFeeds, $itemRSS);

Now I want to get the "media:content" and "media:thumbnail" url attributes, how would i do that? Now i think i should be using DOMElement::getAttribute but i haven't managed to get it to work :/ Can anyone shed some light on this, and also let me know if this is a good way to parse XML?

Regards, Shadi


You can use SimpleXML as suggested by the other posters, but you need to use the children() and attributes() functions so you can deal with the different namespaces

Example (untested):

$feed = file_get_contents('');
$xml = new SimpleXMLElement($feed);
foreach ($xml->channel->item as $item) {
    foreach ($item->children('' as $media_element) {

Alternatively, you can use XPath (again, untested):

$feed = file_get_contents('');
$xml = new SimpleXMLElement($feed);
$xml->registerXPathNamespace('media', '');
$images = $xml->xpath('/rss/channel/item/media:content@url');


Try this. It'll work fine.

$doc = new DOMDocument();
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
    $itemRSS = array ( 
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
        'thumbnail' => $node->getElementsByTagName('thumbnail')->item(0)->getAttribute('url')
    array_push($arrFeeds, $itemRSS);


This was how i have eventually done it using XMLReader:


define ('XMLFILE', '');
echo "<pre>";

$items = array ();
$i = 0;

$xmlReader = new XMLReader();
$xmlReader->open(XMLFILE, null, LIBXML_NOBLANKS);

$isParserActive = false;
$simpleNodeTypes = array ("title", "description", "media:title", "link", "author", "pubDate", "guid");

while ($xmlReader->read ())
    $nodeType = $xmlReader->nodeType;

    // Only deal with Beginning/Ending Tags
    if ($nodeType != XMLReader::ELEMENT && $nodeType != XMLReader::END_ELEMENT) { continue; }
    else if ($xmlReader->name == "item") {
        if (($nodeType == XMLReader::END_ELEMENT) && $isParserActive) { $i++; }
        $isParserActive = ($nodeType != XMLReader::END_ELEMENT);

    if (!$isParserActive || $nodeType == XMLReader::END_ELEMENT) { continue; }

    $name = $xmlReader->name;

    if (in_array ($name, $simpleNodeTypes)) {
        // Skip to the text node
        $xmlReader->read ();
        $items[$i][$name] = $xmlReader->value;
    } else if ($name == "media:thumbnail") {
        $items[$i]['media:thumbnail'] = array (
                "url" => $xmlReader->getAttribute("url"),
                "width" => $xmlReader->getAttribute("width"),
                "height" => $xmlReader->getAttribute("height"),
                "type" => $xmlReader->getAttribute("type")
    } else if ($name == "media:content") {
        $items[$i]['media:content'] = array (
                "url" => $xmlReader->getAttribute("url"),
                "width" => $xmlReader->getAttribute("width"),
                "height" => $xmlReader->getAttribute("height"),
                "filesize" => $xmlReader->getAttribute("fileSize"),
                "expression" => $xmlReader->getAttribute("expression")

echo "</pre>";




#Convert the String Into XML
$xml = new SimpleXMLElement($_POST['name']);

#Itterate through the XML for the data 

$values = "VALUES('' , ";
foreach($xml->item as $item)
 //you now have access to that aitem



Try using SimpleXML:


You would want something like this:

'content' => $node->getElementsByTagNameNS('', 'content')->item(0)->getAttribute('url');
'thumbnail' => $node->getElementsByTagNameNS('', 'thumbnail')->item(0)->getAttribute('url');

I believe that will work, it's been a while since I've done anything like this.


You may get the error Call to a member function getAttribute() on a non-object if a feed is missing entries like thumbnail, so while I like @Helder Robalo's answer you should check to make sure a node exists before trying to use things like getAttribute():


header('Content-type: text/plain; charset=utf-8');

$doc = new DOMDocument();
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
    $itemRSS = array (
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue

    if( sizeof($node->getElementsByTagName('thumbnail')->item(0)) > 0 )
        $itemRSS['thumbnail'] = $node->getElementsByTagName('thumbnail')->item(0)->getAttribute('url');
        $itemRSS['thumbnail'] = '';

    array_push($arrFeeds, $itemRSS);



Media:content attributes are actually pretty easy to get with SIMPLE XML


  foreach($x->channel->item as $entry)
    $media = $entry->children('')->attributes();
    $url = (string) $media['url'];