I have this XML file:
<domain type='kmc' id='007'>
<name>virtual bug</name>
<uuid>66523dfdf555dfd</uuid>
<os>
<type arch='xintel' machine='ubuntu'>hvm</type>
<boot dev='hd'/>
<boot dev='cdrom'/>
</os>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>270336</currentMemory>
<vcpu placement='static'>10</vcpu>
Now, I want parse this and fetch its attribute value. For instance, I want to fetch the uuid
field. So what should be the proper method to fetch it, in Python?
Here's an lxml snippet that extracts an attribute as well as element text (your question was a little ambiguous about which one you needed, so I'm including both):
from lxml import etree
doc = etree.parse(filename)
memoryElem = doc.find('memory')
print memoryElem.text # element text
print memoryElem.get('unit') # attribute
You asked (in a comment on Ali Afshar's answer) whether minidom (2.x, 3.x) is a good alternative. Here's the equivalent code using minidom; judge for yourself which is nicer:
import xml.dom.minidom as minidom
doc = minidom.parse(filename)
memoryElem = doc.getElementsByTagName('memory')[0]
print ''.join( [node.data for node in memoryElem.childNodes] )
print memoryElem.getAttribute('unit')
lxml seems like the winner to me.
XML
<data>
<items>
<item name="item1">item1</item>
<item name="item2">item2</item>
<item name="item3">item3</item>
<item name="item4">item4</item>
</items>
</data>
Python :
from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item')
print "Len : ", len(itemlist)
print "Attribute Name : ", itemlist[0].attributes['name'].value
print "Text : ", itemlist[0].firstChild.nodeValue
for s in itemlist :
print "Attribute Name : ", s.attributes['name'].value
print "Text : ", s.firstChild.nodeValue
etree, with lxml probably:
root = etree.XML(MY_XML)
uuid = root.find('uuid')
print uuid.text
Other people can tell you how to do it with the Python standard library. I'd recommend my own mini-library that makes this a completely straight forward.
>>> obj = xml2obj.xml2obj("""<domain type='kmc' id='007'>
... <name>virtual bug</name>
... <uuid>66523dfdf555dfd</uuid>
... <os>
... <type arch='xintel' machine='ubuntu'>hvm</type>
... <boot dev='hd'/>
... <boot dev='cdrom'/>
... </os>
... <memory unit='KiB'>524288</memory>
... <currentMemory unit='KiB'>270336</currentMemory>
... <vcpu placement='static'>10</vcpu>
... </domain>""")
>>> obj.uuid
u'66523dfdf555dfd'
http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
I would use lxml and parse it out using xpath //UUID
Above XML does not have closing tag, It will give
etree parse error: Premature end of data in tag
Correct XML is:
<domain type='kmc' id='007'>
<name>virtual bug</name>
<uuid>66523dfdf555dfd</uuid>
<os>
<type arch='xintel' machine='ubuntu'>hvm</type>
<boot dev='hd'/>
<boot dev='cdrom'/>
</os>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>270336</currentMemory>
<vcpu placement='static'>10</vcpu>
</domain>