xmlstarlet sel on large file

2020-02-10 06:39发布

The command

$ xmlstarlet sel -t -c "/collection/record" file.xml

seems to load the whole file into memory, before applying the given Xpath expression. This is not usable for large XML files.

Does xmlstarlet provide a streaming mode to extract subelements from a large (100G+) XML file?

2条回答
仙女界的扛把子
2楼-- · 2020-02-10 07:23

Since I only needed a tiny subset of XPath for large XML files, I actually implemented a little tool myself: xmlcutty.

The example from my question could be written like this:

$ xmlcutty -path /collection/record file.xml
查看更多
狗以群分
3楼-- · 2020-02-10 07:36

Xmlstarlet translates all (or most) operations into xslt transformations, so the short answer is no.

You could try to use stx, which is streaming transformation language similar to xslt. On the other hand, just coding something together in python using sax or iterparse may be easier and faster (wrt time needed to create code) if you don't care about xml that much.

查看更多
登录 后发表回答