Using R and the package XML I'm parsing huge XML files. As part of the data handling I need to now, in a long list of nodes, how many children of specific name each node has (the number of nodes can exceed 20.000)
My approach at the moment is:
nChildrenWithName <- xpathSApply(doc, path="/path/to/node/*", namespaces=ns, xmlName) == 'NAME'
nChildren <- xpathSApply(doc, path="/path/to/node", namespaces=ns, fun=xmlSize)
nID <- sapply(split(nChildrenWithName, rep(seq(along=nChildren), nChildren)), sum)
Which is as vectorized as I can get it. Still I have the feeling that this can be achieved in a single call using the correct XPATH expression. My knowledge on XPATH is limited though, so if anyone knows how to do it I would be grateful for some insight...
best Thomas
If I understand correctly the question, there is a XML like:
<path>
<to>
<node>
<NAME>A</NAME>
<NAME>B</NAME>
<NAME>C</NAME>
</node>
<node>
<NAME>X</NAME>
<NAME>Y</NAME>
</node>
</to>
<to>
<node>
<NAME>AA</NAME>
<NAME>BB</NAME>
<NAME>CC</NAME>
</node>
</to>
</path>
and what is wanted is the number of NAME
elements under each node
one - so 3, 2, 3 in the example above.
This is not possible in XPath 1.0: an expression can return a list of nodes or a single value - but not a list of computed values.
Using XPath 2.0 you can write:
for $node in /path/to/node return count($node/NAME)
or simply:
/path/to/node/count(NAME)
(You can test them here)
library(XML)
doc <- xmlTreeParse(
system.file("exampleData", "mtcars.xml", package="XML"),
useInternalNodes=TRUE )
xpathApply(xmlRoot(doc),path="count(//variable)",xmlValue)
Considering the example mentioned by MiMo
<path>
<to>
<node>
<NAME>A</NAME>
<NAME>B</NAME>
<NAME>C</NAME>
</node>
<node>
<NAME>X</NAME>
<NAME>Y</NAME>
</node>
</to>
<to>
<node>
<NAME>AA</NAME>
<NAME>BB</NAME>
<NAME>CC</NAME>
</node>
</to>
</path>
To get number of children under /path/to/node
library(XML)
doc = xmlParse("filename", useInternalNodes = TRUE)
rootNode = xmlRoot(doc)
childnodes = xpathSApply(rootNode[[1]][[1]], ".//NAME", xmlChildren)
length(childnodes)
[1] 3
It will give you 3, similarly to get number of children under second node just pass the index accordingly,
childnodes = xpathSApply(rootNode[[1]][[2]], ".//NAME", xmlChildren)
length(childnodes)
[1] 2
I hope it will help you.