memory leak parsing xml using xml.dom.minidom

2019-02-20 11:03发布

问题:

I'm using xml.dom.minidom to parse xml files, somewhat like this:

import xml.dom.minidom as dom

file= open('file.xml')
doc= dom.parse(file)
# SNIP
doc.unlink()

Even after unlinking the document, the memory usage is at about 120 MiB. When one is actually using the program, causing multiple xml files to be parsed, memory usage climbs to about 300 MiB, which is unacceptable.

I'm sure the memory leak isn't caused by my code, but by minidom, because even doing just

doc= dom.parse(file)
doc.unlink()

produces the same result.

Am I doing something wrong, or is this a bug in minidom?

P.S.: I'd prefer to stick to minidom, because there's a lot of xml parsing happening in my code, and I'd rather not completely rewrite all of it, but I will do it if there's no other choice.

回答1:

I am also observing the same issues with minidom! And we are not alone. See for example here.

There it is suggested to use an other XML implementations with python binding like

  • xml.etree.ElementTree: alternative implementation in the Python standard library
  • libxml2: XML C parser with python bindings
  • lxml: a more pythonic binding to libxml2