memory leak parsing xml using xml.dom.minidom

2019-02-20 11:03发布

站内文章 / Python

26 0

爷、活的狠高调

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm using xml.dom.minidom to parse xml files, somewhat like this:

import xml.dom.minidom as dom

file= open('file.xml')
doc= dom.parse(file)
# SNIP
doc.unlink()

Even after unlinking the document, the memory usage is at about 120 MiB. When one is actually using the program, causing multiple xml files to be parsed, memory usage climbs to about 300 MiB, which is unacceptable.

I'm sure the memory leak isn't caused by my code, but by minidom, because even doing just

doc= dom.parse(file)
doc.unlink()

produces the same result.

Am I doing something wrong, or is this a bug in minidom?

P.S.: I'd prefer to stick to minidom, because there's a lot of xml parsing happening in my code, and I'd rather not completely rewrite all of it, but I will do it if there's no other choice.

回答1:

I am also observing the same issues with minidom! And we are not alone. See for example here.

There it is suggested to use an other XML implementations with python binding like

xml.etree.ElementTree: alternative implementation in the Python standard library
libxml2: XML C parser with python bindings
lxml: a more pythonic binding to libxml2