iterparse无法解析领域，而其他类似的人都很好(iterparse fails to pars

我使用Python的iterparse解析一个Nessus的扫描（.nessus文件）的XML结果。出乎人们意料的记录解析失败，欺骗那些类似已正确解析。

XML文件的一般结构是很多类似下面的记录：

<ReportHost>
  <ReportItem>
    <foo>9.3</foo>
    <bar>hello</bar>
  </ReportItem>
  <ReportItem>
     <foo>10.0</foo>
     <bar>world</bar>
</ReportHost>
<ReportHost>
   ...
</ReportHost>

换句话说很多主机（ ReportHost ）有很多项目的报告（ ReportItem ），后者具有几个特点（ foo ， bar ）。我定定地看着生成每个项目一行，其特性。

解析在文件的中间一个简单的线路发生故障（ foo在这种情况下是cvss_base_score ）

<cvss_base_score>9.3</cvss_base_score>

而200〜类似的线路已经解析没有问题。

相关的一段代码如下-它集上下文标记（ inReportHost和inReportEvent它告诉我，在XML文件中我在的狭窄，并且或者分配或打印一个值，根据上下文）

import xml.etree.cElementTree as ET
inReportHost = False
inReportItem = False

for event, elem in ET.iterparse("test2.nessus", events=("start", "end")):
    if event == 'start' and elem.tag == "ReportHost":
        inReportHost = True
    if event == 'end' and elem.tag == "ReportHost":
        inReportHost = False
        elem.clear()
    if inReportHost:
        if event == 'start' and elem.tag == 'ReportItem':
            inReportItem = True
            cvss = ''
        if event == 'start' and inReportItem:
            if event == 'start' and elem.tag == 'cvss_base_score':
                cvss = elem.text
        if event == 'end' and elem.tag == 'ReportItem':
            print cvss
            inReportItem = False

cvss有时具有无值（后cvss = elem.text分配），即使相同的条目已经在该文件中properely较早解析。

如果我添加的线沿线的东西assignement以下

if cvss is None: cvss = "0"

然后许多进一步的解析cvss分配它们的适当的值（和其他一些是无）。

服用时<ReportHost>...</reportHost>这会导致错误的分析，并通过程序运行它- （即正常工作cvss被分配9.3如预期）。

我在迷失，我在我的代码犯了一个错误，因为，withing一大套类似的记录，一些A预正确处理和一些 - 不（有些记录是相同的，并且还以不同方式处理）。我也找不到什么都讲究失败的记录 - 早期相同的人，后来都很好。

从iterparse（）文档：

注：iterparse（）只保证它已经出现了开始标记的“>”字符时，它会发出“开始”的事件，所以属性的定义，但文字和尾属性的内容在这一点上不确定的。这同样适用于子元素; 它们可以或可以不存在。如果你需要一个完全填充的元素，寻找“结束”事件，而不是。

降inReport*变量和工艺上的“结束”事件仅ReportHost当它完全解析。使用ElementTree的API，以获取必要的信息，如cvss_base_score从当前ReportHost元素。

为了节省内存，这样做：

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory

for host in getelements("test2.nessus", "ReportHost"):
    for cvss_el in host.iter("cvss_base_score"):
        print(cvss_el.text)