PHP Simple HTML DOM Parser - Link element in RSS

2019-02-19 21:02发布

I just started using PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/) and have some problems parsing XML.

I can perfectly parse all the links from HTML documents, but parsing links from RSS feeds (XML format) doesn't work. For example, I want to parse all the links from http://www.bing.com/search?q=ipod&count=50&first=0&format=rss so I use this code:

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');

foreach($content->find('item') as $entry)
{
$item['title']     = $entry->find('title', 0)->plaintext;
$item['description']    = $entry->find('description', 0)->plaintext;
$item['link'] = $entry->find('link', 0)->plaintext;
$parsed_results_array[] = $item;
}

print_r($parsed_results_array);

The script parses title and description but link element is empty. Any ideas? My guess is that "link" is reserved word or something, so how do I get the parser to work?

3条回答
虎瘦雄心在
2楼-- · 2019-02-19 21:23

I suggest you use the right tool for this job. Use SimpleXML: Plus, its built-in :)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$parsed_results_array = array();
foreach($xml as $entry) {
    foreach($entry->item as $item) {
        // $parsed_results_array[] = json_decode(json_encode($item), true);
        $items['title'] = (string) $item->title;
        $items['description'] = (string) $item->description;
        $items['link'] = (string) $item->link;
        $parsed_results_array[] = $items;
    }
}

echo '<pre>';
print_r($parsed_results_array);

Should yield something like:

Array
(
    [0] => Array
        (
            [title] => Apple - iPod
            [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music.
            [link] => http://www.apple.com/ipod/
        )

    [1] => Array
        (
            [title] => iPod - Wikipedia, the free encyclopedia
            [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ...
            [link] => http://en.wikipedia.org/wiki/IPod
        )
查看更多
太酷不给撩
3楼-- · 2019-02-19 21:25

edit simple_html_doom class

protected $self_closing_tags

delete key "link"

BEFORE:

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

AFTER:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);
查看更多
爷、活的狠高调
4楼-- · 2019-02-19 21:38

If you are used to use PHP Simple HTML DOM, you can keep using it! Too many approaches would make confusions, and simplehtmldom is already easy and powerful.

Be sure you start like this:

require_once('lib/simple_html_dom.php');

$content =  file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$xml = new simple_html_dom();
$xml->load($content);

Then you can go with you queries!

查看更多
登录 后发表回答