Reading text in `<![CDATA[…]]>` with SimpleXMLE

2020-03-04 04:21发布

问题:

I'm importing an RSS feed with SimpleXMLElement in PHP. I'm having trouble with the title and description. For some reason, the website I get the feed from puts the title and description in <![CDATA[...]]>:

<item>
<title><![CDATA[...title...]]></title>
<link>...url...</link>
<description><![CDATA[...title...]]></description>
<pubDate>...date...</pubDate>
<guid>...link...</guid>
</item>

When I do a var_dump() on the SimpleXMLElement, I get (for this part):

  [2]=>
  object(SimpleXMLElement)#5 (5) {
    ["title"]=>
    object(SimpleXMLElement)#18 (0) {
    }
    ["link"]=>
    string(95) "...link..."
    ["description"]=>
    object(SimpleXMLElement)#19 (0) {
    }
    ["pubDate"]=>
    string(31) "...date..."
    ["guid"]=>
    string(48) "...link..."
  }

How can I get the value in <![CDATA[...]]> to read the title and description from the feed?

回答1:

SimpleXML reads CDATA nodes absolutely fine. The only problem you're having is that print_r, var_dump, and similar functions don't give an accurate representation of SimpleXML objects, because they are not implemented fully in PHP.

If you run echo $myNode->description you will see the content of the CDATA section just fine. The reason is that when you ask for a SimpleXMLElement to be converted to a string, it automatically combines all the text and CDATA content for you - but until you do, it remembers the distinction.

As a general case, to extract the string content of any element or attribute in SimpleXML, cast to string with (string)$myNode. This also prevents other issues, such as functions complaining about getting an object when they were expecting a string, or failure to serialize when saving to a session.

See also my previous answer at https://stackoverflow.com/a/13830559/157957