The rss file is shown as below, i want to get the content in section media:group . I check the document of feedparser, but it seems not mention this. How to do it? Any help is appreciated.
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:ymusic="http://music.yahoo.com/rss/1.0/ymusic/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel>
<title>XYZ InfoX: Special hello </title>
<link>http://www1.XYZInfoX.com/learninghello/home</link>
<description>hello</description>
<language>en</language> <copyright />
<pubDate>Wed, 17 Mar 2010 08:50:06 GMT</pubDate>
<dc:creator />
<dc:date>2010-03-17T08:50:06Z</dc:date>
<dc:language>en</dc:language> <dc:rights />
<image>
<title>Voice of America</title>
<link>http://www1.XYZInfoX.com/learninghello</link>
<url>http://media.XYZInfoX.com/designimages/XYZRSSIcon.gif</url>
</image>
<item>
<title>Who Were the Deadliest Gunmen of the Wild West?</title>
<link>http://www1.XYZInfoX.com/learninghello/home/Deadliest-Gunmen-of-the-Wild-West-87826807.html</link>
<description> The story of two of them: "Killin'" Jim Miller was an outlaw, "Texas" John Slaughter was a lawman | EXPLORATIONS </description>
<pubDate>Wed, 17 Mar 2010 00:38:48 GMT</pubDate>
<guid isPermaLink="false">87826807</guid>
<dc:creator></dc:creator>
<dc:date>2010-03-17T00:38:48Z</dc:date>
<media:group>
<media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_480_16mar_se.jpg" medium="image" isDefault="true" height="300" width="480" />
<media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_230_16mar_se_edited-1.jpg" medium="image" isDefault="false" height="230" width="230" />
<media:content url="http://media.XYZInfoX.com/images/tex_trans_lawmans_230_16mar10_se.jpg" medium="image" isDefault="false" height="230" width="230" />
<media:content url="http://www.XYZInfoX.com/MediaAssets2/learninghello/dalet/se-exp-outlaws-part2-17mar2010.Mp3" type="audio/mpeg" medium="audio" isDefault="false" />
</media:group>
</item>
You can parse the feed using
and then access your xml elements using either python's attribute access or dictionary-like access on
feed
and its subelements. The former method won't work for an element name likemedia:content
, so use the latter method.The rest should become clear after studying the examples at http://www.feedparser.org
feedparser 4.1 as available from PyPi has this bug.
the solution for me was to get the latest feedparser.py (4.2 pre) from the repository.
now you can access all mrss items
should do the job for you