simplepie not parsing google news rss feed

2019-07-24 07:31发布

问题:

This code works perfectly with any other rss feed but not with google news feeds. I do not know what I am doing wrong, I think it's some bug. I keep getting this error when I try to read google news feeds

This XML document is invalid, likely due to invalid characters. XML error: SYSTEM or PUBLIC, the URI is missing at line 1, column 61

For example if we try the http://stackoverflow.com/feeds feeds it works nicely, but not with google news feeds. Can some one give me a hint?

<?php

    //get the simplepie library
    require_once('simplepie.inc');

    //grab the feed
    $feed = new SimplePie();

    $feed->set_feed_url("http://news.google.com/news?hl=en&gl=us&q=austria&ie=UTF-8&output=rss");
    $feed->force_feed(true);
    //$feed->encode_instead_of_strip(true);


    //enable caching
    $feed->enable_cache(true);

    //provide the caching folder
    $feed->set_cache_location('cache');

    //set the amount of seconds you want to cache the feed
    $feed->set_cache_duration(1800);

    //init the process
    $feed->init();

    //let simplepie handle the content type (atom, RSS...)
    $feed->handle_content_type();

?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>simple</title>
</head>

<body>
<div id="page-wrap">

    <h1>News Finder</h1>

    <?php if ($feed->error): ?>
      <p><?php echo $feed->error; ?></p>
    <?php endif; ?>

    <?php foreach ($feed->get_items() as $item): ?>

        <div class="chunk">

            <h4 style="background:url(<?php $feed = $item->get_feed(); echo $feed->get_favicon(); ?>) no-repeat; text-indent: 25px; margin: 0 0 10px;"><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></h4>

            <p class="footnote">Source: <a href="<?php $feed = $item->get_feed(); echo $feed->get_permalink(); ?>"><?php $feed = $item->get_feed(); echo $feed->get_title(); ?></a> | <?php echo $item->get_date('j M Y | g:i a T'); ?></p>



        </div>

    <?php endforeach; ?>


</div>

回答1:

Make sure you're using SimplePie 1.2.1, 1.2 had a bug with URL parsing which can cause this type of error.

(I'm also the SimplePie lead developer, so feel free to shoot questions straight to my email)

If you are using 1.2.1, it would appear that this is a manifestation of bug #162 which is currently unconfirmed. I'll take an indepth look into this, but it appears to definitely be an error in SimplePie, not in your code.

(I'll also post back here with why this is occurring for the curious amongst you.)



回答2:

I have no clue about SimplePie, however, the simple way in your case might be just SimpleXML:

$url = "http://news.google.com/news?hl=en&gl=us&q=austria&bav=on.2,or.r_gc.r_pw.,cf.osb&biw=1920&bih=973&um=1&ie=UTF-8&output=rss";
$feed = simplexml_load_file($url);

echo $feed->channel->title, "\n<", $feed->channel->link, ">\n\n";

foreach($feed->channel->item as $item)
{
    echo "* $item->title\n  <$item->link>\n";
}

SimpleXML is normally directly available with PHP, you don't need to install any library or so.

Demo



回答3:

For Google News feed uses :

$feed->set_raw_data(file_get_contents($rssurl));


回答4:

Just wanted to add a note here for others that think the above answer doesn't work. If your getting a null on item title, check the feed source, it may not be anything wrong with your simplepie or script, but your browser setting it to null because of html code within the title item tags.