Reading XML using Python minidom and iterating ove

I have an XML structure that looks like the following, but on a much larger scale:

<root>
    <conference name='1'>
        <author>
            Bob
        </author>
        <author>
            Nigel
        </author>
    </conference>
    <conference name='2'>
        <author>
            Alice
        </author>
        <author>
            Mary
        </author>
    </conference>
</root>

For this, I used the following code:

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
    conf_name=node.getAttribute('name')
    print conf_name
    alist=node.getElementsByTagName('author')
    for a in alist:
        authortext= a.nodeValue
        print authortext

However, the authortext that is printed out is 'None.' I tried messing around with using variations like what is below, but it causes my program to break.

authortext=a[0].nodeValue

The correct output should be:

1
Bob
Nigel
2
Alice
Mary

But what I get is:

1
None
None
2
None
None

Any suggestions on how to tackle this problem?

标签： python xml parsing minidom

5条回答

地球回转人心会变

2楼-- · 2019-01-17 04:02

Quick access:

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

3楼-- · 2019-01-17 04:06

Element nodes don't have a nodeValue. You have to look at the Text nodes inside them. If you know there's always one text node inside you can say element.firstChild.data (data is the same as nodeValue for text nodes).

Be careful: if there is no text content there will be no child Text nodes and element.firstChild will be null, causing the .data access to fail.

Quick way to get the content of direct child text nodes:

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

In DOM Level 3 Core you get the textContent property you can use to get text from inside an Element recursively, but minidom doesn't support this (some other Python DOM implementations do).

0人赞添加讨论(0) 举报

SAY GOODBYE

4楼-- · 2019-01-17 04:09

I played around with it a bit, and here's what I got to work:

# ...
authortext= a.childNodes[0].nodeValue
print authortext

leading to output of:

C:\temp\py>xml2.py
1
Bob
Nigel
2
Alice
Mary

I can't tell you exactly why you have to access the childNode to get the inner text, but at least that's what you were looking for.

0人赞添加讨论(0) 举报

等我变得足够好

5楼-- · 2019-01-17 04:11

your authortext is of type 1 (ELEMENT_NODE), normally you need to have TEXT_NODE to get a string. This will work

a.childNodes[0].nodeValue

0人赞添加讨论(0) 举报

霸刀☆藐视天下

6楼-- · 2019-01-17 04:20

Since you always have one text data value per author you can use element.firstChild.data

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
    conference_name = conference.getAttribute("name")
    print 
    print conference_name.upper() + " - "

    authors = conference.getElementsByTagName("author")
    for author in authors:
        print "  ", author.firstChild.data
    # for

    print

0人赞添加讨论(0) 举报

Reading XML using Python minidom and iterating ove

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间