saving an 'lxml.etree._ElementTree' object

I've spent the last couple of days getting to grips with the basics of lxml; in particular using lxml.html to parse websites and create an ElementTree of the content. Ideally, I want to save the returned ElementTree so that I can load it up and experiment with it, without having to parse the website every time I modify my script. I assumed that pickling would be the way to go, however I'm now beginning to wonder. Although I am able to retrieve an ElementTree object after pickling...

type(myObject)

returns

<class 'lxml.etree._ElementTree'>

the object itself appears to be 'empty', since none of the subsequent method/attribute calls I make on it yield any output.

My guess is that pickling isn't appropriate here, but can anyone suggest an alternative?

(In case it matters, the above is happening in: python3.2, lxml 2.3.2, snow-leopard))

标签： python lxml pickle

3条回答

我欲成王，谁敢阻挡

2楼-- · 2019-01-18 12:36

lxml is a C library - libxml to be precise - and the object probably don't support python pickling or any other kind of serialization - except serializing them to XML.

So you'll either have to keep them in memory, or re-parse the XML fragments you need, I assume.

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2019-01-18 12:38

You are already dealing with XML, and lxml is great at parsing XML. So I think the simplest thing to do would be to serialize to XML:

To write to file:

import lxml.etree as ET

filename = '/tmp/test.xml'
myobject.write(filename)

To call the write method, note that myobject must be an lxml.etree._ElementTree. If it is an lxml.etree._Element, then you would need myobject.getroottree().write(filename).

To parse from file name/path, file object, or URL:

myobject = ET.parse(file_or_url)

To parse from string:

myobject = ET.fromstring(content)

0人赞添加讨论(0) 举报

太酷不给撩

4楼-- · 2019-01-18 12:48

I don't believe you can pickle lxml instances, but what I did because I was in a similar situation was I pickled the object instances that would build the tree.

Each instance and its child had a function to build the Element tree. So I would simply pickle/cache the Python object, fetch it from cache, and then call the build functions to get my Element tree.

0人赞添加讨论(0) 举报

saving an 'lxml.etree._ElementTree' object

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间