<![CDATA in SimplePie

2019-08-07 09:38发布

I've been working on some RSS Scrapper that parses data from multiple sources. That said, all this sources have their own implementation of the description of the RSS.

One in particular, uses CDATA tags to write the description on like, for example

<![CDATA[
<p align=justify><font face="verdana, arial, helvetica, sans-serif" size=1>
<font color=#004080></font>
SOME TEXT GOES HERE 
 </font></p>
]]>

However if I try to get the item description with SimplePie I get this output

<div><p align="justify"></p></div>

I'm using this php script to do all this

 foreach($feed->get_Items() as $item)
 {

        $title = $item->get_title();
         $description = $item->get_description();
        //some other stuff
 }

And now the good part

The title on the feed comes also like this

<title>
  <![CDATA[ 
     Nice title
  ]]>
</title>

And... it works!!!

How can I get the description of the feed? I've tried almost everything!

Thank you!

1条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-08-07 10:23

The get_description() and get_content() methods both do sanitation on the raw data, but you can use the get_item_tags() method to extract it untouched, like this:

$desc_tags = ($item->get_item_tags('', 'description')); // empty namespace is RSS2.0
if ($desc_tags) {
    print $desc_tags[0]['data'];
}

The only caveat is while the get_content or get_description will try to detect the namespace, you will have to provide it to get_item_tags, you can see the namespace constants here. If you know the feeds format beforehand that should not be a problem, otherwise you might need to do the trial and error that the get_description do.

查看更多
登录 后发表回答