PHP Simple HTML DOM Parser - Link element in RSS

2019-02-19 21:32发布

问题:

I just started using PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/) and have some problems parsing XML.

I can perfectly parse all the links from HTML documents, but parsing links from RSS feeds (XML format) doesn't work. For example, I want to parse all the links from http://www.bing.com/search?q=ipod&count=50&first=0&format=rss so I use this code:

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');

foreach($content->find('item') as $entry)
{
$item['title']     = $entry->find('title', 0)->plaintext;
$item['description']    = $entry->find('description', 0)->plaintext;
$item['link'] = $entry->find('link', 0)->plaintext;
$parsed_results_array[] = $item;
}

print_r($parsed_results_array);

The script parses title and description but link element is empty. Any ideas? My guess is that "link" is reserved word or something, so how do I get the parser to work?

回答1:

I suggest you use the right tool for this job. Use SimpleXML: Plus, its built-in :)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$parsed_results_array = array();
foreach($xml as $entry) {
    foreach($entry->item as $item) {
        // $parsed_results_array[] = json_decode(json_encode($item), true);
        $items['title'] = (string) $item->title;
        $items['description'] = (string) $item->description;
        $items['link'] = (string) $item->link;
        $parsed_results_array[] = $items;
    }
}

echo '<pre>';
print_r($parsed_results_array);

Should yield something like:

Array
(
    [0] => Array
        (
            [title] => Apple - iPod
            [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music.
            [link] => http://www.apple.com/ipod/
        )

    [1] => Array
        (
            [title] => iPod - Wikipedia, the free encyclopedia
            [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ...
            [link] => http://en.wikipedia.org/wiki/IPod
        )


回答2:

If you are used to use PHP Simple HTML DOM, you can keep using it! Too many approaches would make confusions, and simplehtmldom is already easy and powerful.

Be sure you start like this:

require_once('lib/simple_html_dom.php');

$content =  file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$xml = new simple_html_dom();
$xml->load($content);

Then you can go with you queries!



回答3:

edit simple_html_doom class

protected $self_closing_tags

delete key "link"

BEFORE:

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

AFTER:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);