I use Python's iterparse
to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed correctly.
The general structure of the XML file is a lot of records like the one below:
<ReportHost>
<ReportItem>
<foo>9.3</foo>
<bar>hello</bar>
</ReportItem>
<ReportItem>
<foo>10.0</foo>
<bar>world</bar>
</ReportHost>
<ReportHost>
...
</ReportHost>
In other words a lot of hosts (ReportHost
) with a lot of items to report (ReportItem
), and the latter having several characteristics (foo
, bar
). I will be looking at generating one line per item, with its characteristics.
The parsing fails in the middle of the file at a simple line (foo
in that case being cvss_base_score
)
<cvss_base_score>9.3</cvss_base_score>
while ~200 similar lines have been parsed without problems.
The relevant piece of code is below -- it sets context markers (inReportHost
and inReportEvent
which tell me where in the stricture of the XML file I am in, and either assign or print a value, depending on the context)
import xml.etree.cElementTree as ET
inReportHost = False
inReportItem = False
for event, elem in ET.iterparse("test2.nessus", events=("start", "end")):
if event == 'start' and elem.tag == "ReportHost":
inReportHost = True
if event == 'end' and elem.tag == "ReportHost":
inReportHost = False
elem.clear()
if inReportHost:
if event == 'start' and elem.tag == 'ReportItem':
inReportItem = True
cvss = ''
if event == 'start' and inReportItem:
if event == 'start' and elem.tag == 'cvss_base_score':
cvss = elem.text
if event == 'end' and elem.tag == 'ReportItem':
print cvss
inReportItem = False
cvss
sometimes has the None value (after the cvss = elem.text
assignment), even though identical entries have been parsed properely earlier in the file.
If I add below the assignement something along the lines of
if cvss is None: cvss = "0"
then the parsing of many further cvss
assign their proper values (and some other are None).
When taking the <ReportHost>...</reportHost>
which causes the wrong parsing and running it through the program - it works fine (ie. cvss
is assigned 9.3
as expected).
I am lost at where I make a mistake in my code since, withing a large set of similar records, some apre processed correctly and some - not (some of the records are identical, and still are processed differently). I also cannot find anything particular about the records that fail - identical ones earlier and later are fine.
From the iterparse() docs:
Drop
inReport*
variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such ascvss_base_score
from current ReportHost element.To preserve memory, do: