PHP Simple HTML DOM Parser - Link element in RSS

I just started using PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/) and have some problems parsing XML.

I can perfectly parse all the links from HTML documents, but parsing links from RSS feeds (XML format) doesn't work. For example, I want to parse all the links from http://www.bing.com/search?q=ipod&count=50&first=0&format=rss so I use this code:

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');

foreach($content->find('item') as $entry)
{
$item['title']     = $entry->find('title', 0)->plaintext;
$item['description']    = $entry->find('description', 0)->plaintext;
$item['link'] = $entry->find('link', 0)->plaintext;
$parsed_results_array[] = $item;
}

print_r($parsed_results_array);

The script parses title and description but link element is empty. Any ideas? My guess is that "link" is reserved word or something, so how do I get the parser to work?

标签： php xml simplexml

3条回答

虎瘦雄心在

2楼-- · 2019-02-19 21:23

I suggest you use the right tool for this job. Use SimpleXML: Plus, its built-in :)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$parsed_results_array = array();
foreach($xml as $entry) {
    foreach($entry->item as $item) {
        // $parsed_results_array[] = json_decode(json_encode($item), true);
        $items['title'] = (string) $item->title;
        $items['description'] = (string) $item->description;
        $items['link'] = (string) $item->link;
        $parsed_results_array[] = $items;
    }
}

echo '<pre>';
print_r($parsed_results_array);

Should yield something like:

Array
(
    [0] => Array
        (
            [title] => Apple - iPod
            [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music.
            [link] => http://www.apple.com/ipod/
        )

    [1] => Array
        (
            [title] => iPod - Wikipedia, the free encyclopedia
            [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ...
            [link] => http://en.wikipedia.org/wiki/IPod
        )

0人赞添加讨论(0) 举报

太酷不给撩

3楼-- · 2019-02-19 21:25

edit simple_html_doom class

protected $self_closing_tags

delete key "link"

BEFORE:

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

AFTER:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

0人赞添加讨论(0) 举报

爷、活的狠高调

4楼-- · 2019-02-19 21:38

If you are used to use PHP Simple HTML DOM, you can keep using it! Too many approaches would make confusions, and simplehtmldom is already easy and powerful.

Be sure you start like this:

require_once('lib/simple_html_dom.php');

$content =  file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss');
$xml = new simple_html_dom();
$xml->load($content);

Then you can go with you queries!

0人赞添加讨论(0) 举报

PHP Simple HTML DOM Parser - Link element in RSS

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间