Python (newbie) Parse XML from API call

I've look for some tutorials/other questions on stack/documentation and still can't figure it out. ugh!!!

Making the API request and the parsing out (want to assign to variables but that's a bonus to this question), This is what I'm trying. Why can't I list the title and link for the items?

#!/usr/bin/python

# Screen Scraper for Subs
import urllib
from xml.etree import ElementTree as ET

show = 'heroes'
season = '4'
language = 'en'
limit = '1'

requestURL = 'http://api.allsubs.org/index.php?' \
           + 'search=' + show \
           + '+season+' + season \
           + '&language=' + language \
           + '&limit=' + limit

root = ET.parse(urllib.urlopen(requestURL)).getroot()
print root
print '\n'

items = root.findall('items')
for item in items: 
     item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
     item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/

XML Response

        <AllSubsAPI> 
        <title>AllSubs API: Subtitles Search</title> 
        <link>http://www.allsubs.org</link> 
        <description><![CDATA[Subtitles Search for Heroes Season 4]]></description> 
        <language>en-us</language> 
        <results>1</results> 
        <found_results>24</found_results> 
<items> 
    <item> 
            <title><![CDATA[Heroes Season 4 Subtitles]]></title> 
            <link>http://www.allsubs.org/subs-download/heroes+season+4/1223435/</link> 
            <filename>heroes-season-4-english-heroes-season-4-en.zip</filename> 
            <files_in_archive>Heroes - 4x01-02 - Orientation.HDTV.FQM.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.720p HDTV.DIMENSION.en.srt|Heroes - 4x05 - Hysterical Blindness.720p HDTV.X264.en.srt|Heroes - 4x09 - Shadowboxing.HDTV.LOL.en.srt|Heroes - 4x16 - Pass Fail.HDTV.LOL.en.srt|Heroes - 4x04 - Acceptance.HDTV.en.srt|Heroes - 4x01-02 - Orientation.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.HDTV.NoTV.en.srt|Heroes - 4x10 - Brother's Keeper.HDTV.FQM.en.srt|Heroes - 4x04 - Acceptance.HDTV.FQM.en.srt|Heroes - 4x14 - Let It Bleed.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.720p HDTV.SiTV.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.NoTV.en.srt|Heroes - 4x12 - The Fifth Stage.HDTV.LOL.en.srt|Heroes - 4x19 - Brave New World.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.720p HDTV.DIMENSION.en.srt|Heroes - 4x03 - Ink.720p HDTV.DIMENSION.en.srt|Heroes - 4x11 - Thanksgiving.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.HDTV.LOL.en.srt|Heroes - 4x14 - Let It Bleed.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.HDTV.LOL.en.srt|Heroes - 4x12 - The Fifth Stage.720p HDTV.DIMENSION.en.srt|Heroes - 4x18 - The Wall.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.720p HDTV.CTU.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.CTU.en.srt|Heroes - 4x09 - Shadowboxing.720p HDTV.DIMENSION.en.srt|Heroes - 4x10 - Brother's Keeper.720p HDTV.DIMENSION.en.srt|Heroes - 4x04 - Acceptance.720p HDTV.CTU.en.srt|Heroes - 4x11 - Thanksgiving.HDTV.FQM.en.srt|Heroes - 4x03 - Ink.HDTV.FQM.en.srt|Heroes - 4x05 - Hysterical Blindness.HDTV.XII.en.srt|</files_in_archive> 
            <languages>en</languages> 
            <added_on>2010-02-16</added_on> 
    </item> 

</items> 
</AllSubsAPI>

UPDATE:

This worked, thanks for the help and pointing out my typo

items = root.findall('items/item')
for item in items: 
     print item.find('title').text
     print item.find('link').text

标签： python xml parsing api

3条回答

何必那么认真

2楼-- · 2019-02-09 21:19

You are not iterating over the 'item' elements, you are in fact iterating over the 'items' elements.

I think it should be:

items = root.findall('items')
childItems = items.findall('item')
for childItem in childItems: 
    childItem.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
    childItem.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-02-09 21:30

items = root.findall('items')

should be

items = root.findall('items/item')

0人赞添加讨论(0) 举报

淡お忘

4楼-- · 2019-02-09 21:31

This works for me. Note I'm using urllib2 to get through a proxy:

import urllib2
from xml.etree import ElementTree as ET

show = 'heroes'
season = '4'
language = 'en'
limit = '1'

requestURL = 'http://api.allsubs.org/index.php?' \
           + 'search=' + show \
           + '+season+' + season \
           + '&language=' + language \
           + '&limit=' + limit

root = ET.parse(urllib2.urlopen(requestURL)).getroot()
print root
print '\n'

items = root.findall('items')[0].findall('item')
for item in items: 
     print item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
     print item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/

note that findall('items') finds the "items" tag, what you want to loop over (I think) are the "item" tags within that, so we findall() of those. Also, you need to print to get anything out of python.

Also, if I do it with limit=2, I get a:

Traceback (most recent call last):
  File "heros.py", line 18, in <module>
    root = ET.parse(urllib2.urlopen(requestURL)).getroot()
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
    tree.parse(source, parser)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
    parser.feed(data)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 24, column 95

I'm not sure the XML coming back from this API is well-formed - there's no "xml" element at the start for starters. I wouldn't trust it...

0人赞添加讨论(0) 举报

Python (newbie) Parse XML from API call

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间