I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint
for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.
xmllint --loaddtd --stream --valid /path/to/huge.xml
I recently learned that xmllint
is also capable of doing command line XPath queries, which is very handy.
xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml
However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.
xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml
Is there a way to enable streaming mode when doing XPath queries using xmllint
? Are there other/better ways to do command line XPath queries for huge XML files?