Is there an elegant way to count tag elements in a

I could read the content of the xml file to a string and use string operations to achieve this, but I guess there is a more elegant way to do this. Since I did not find a clue in the docus, I am sking here:

Given an xml (see below) file, how do you count xml tags, like count of author-tags in the example bewlow the most elegant way? We assume, that each author appears exactly once.

<root>
    <author>Tim</author>
    <author>Eva</author>
    <author>Martin</author>
    etc.
</root>

This xml file is trivial, but it is possible, that the authors are not always listed one after another, there may be other tags between them.

标签： python xml tags count lxml

3条回答

Emotional °昔

2楼-- · 2019-03-24 11:49

Use an XPath with count.

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2019-03-24 11:58

If you want to count all author tags:

import lxml.etree
doc = lxml.etree.parse(xml)
count = doc.xpath('count(//author)')

0人赞添加讨论(0) 举报

再贱就再见

4楼-- · 2019-03-24 12:03

One must be careful using module re to treat a SGML/XML/HTML text, because not all treatments of such files can't be performed with regex (regexes aren't able to parse a SGML/HTML/XML text)

But here, in this particular problem, it seems to me it is possible (re.DOTALL is mandatory because an element may extend on more than one line; apart that, I can't imagine any other possible pitfall)

from time import clock
n= 10000
print 'n ==',n,'\n'



import lxml.etree
doc = lxml.etree.parse('xml.txt')

te = clock()
for i in xrange(n):
    countlxml = doc.xpath('count(//author)')
tf = clock()
print 'lxml\ncount:',countlxml,'\n',tf-te,'seconds'



import re
with open('xml.txt') as f:
    ch = f.read()

regx = re.compile('<author>.*?</author>',re.DOTALL)
te = clock()
for i in xrange(n):
    countre = sum(1 for mat in regx.finditer(ch))
tf = clock()
print '\nre\ncount:',countre,'\n',tf-te,'seconds'

result

n == 10000 

lxml
count: 3.0 
2.84083032899 seconds

re
count: 3 
0.141663256084 seconds

0人赞添加讨论(0) 举报

Is there an elegant way to count tag elements in a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间