Data
I have an xml file with a structure like this (large example to show the needed flexibility):
<rootnode sth="something" descr="ex">
<tag sth="sth1" descr="ex" anoAttr="sth2">
<tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
<tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
<tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
<someContent/>
</tag>
<someContent/>
</tag>
<tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
<someContent/>
</tag>
<tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
<someContent/>
</tag>
</tag>
<tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
<someContent/>
</tag>
<tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
<tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
<tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
<someContent/>
</tag>
<someContent/>
</tag>
</tag>
</tag>
<otherNode>
<someNode/>
</otherNode>
</rootnode>
Specifically, the size of any of the tag
nodes is unknown, the number of attributes is not equal for all tag
nodes and the values of the attributes are not unique.
What I do know, however, is that the value of the searchA
attribute is unique. Also, only tag
nodes can contain an attribute called searchA
and all of them except the top level one do.
Before
I first parse this document using the XML
package with the function xmlTreeParse()
and store the root node. I then create a new node using newXMLNode()
.
xmlfile = xmlTreeParse(filename, useInternalNodes = TRUE)
xmltop = xmlRoot(xmlfile)
newNode = newXMLNode(name = "newlyCreatedNode")
Goal
My goal is to insert my newly created newNode
as a child of the node that has a certain value (for example "sth23"
) as the searchA
attribute.
So in this case I want the result to look like this (notice the <newlyCreatedNode/>
near the bottom):
<rootnode sth="something" descr="ex">
<tag sth="sth1" descr="ex" anoAttr="sth2">
<tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
<tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
<tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
<someContent/>
</tag>
<someContent/>
</tag>
<tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
<someContent/>
</tag>
<tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
<someContent/>
</tag>
</tag>
<tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
<someContent/>
</tag>
<tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
<tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
<tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
<someContent/>
</tag>
<someContent/>
<newlyCreatedNode/>
</tag>
</tag>
</tag>
<otherNode>
<someNode/>
</otherNode>
</rootnode>
Basically, in this case addChildren(xmltop[[1]][[3]][[1]], kids = list(newNode))
gets me the result that I want. Of course I do not want to specify [[1]][[3]][[1]]
.
What I tried
I can get a list of all relevant nodes with xmlElementsByTagName()
and get all attributes with xmlAttrs()
. I can even get a logical index vector which gives me the correct location.
listOfNodes = xmlElementsByTagName(el = xmltop, "tag", recursive = T)
attributeList = lapply(listOfNodes, FUN = function(x) xmlAttrs(x))
indexVector = sapply(attributeList, FUN = function(x) x["searchA"] == "sth23")
indexVector[is.na(indexVector)] = FALSE
listOfNodes[indexVector]
What I do not know is how to use this information to insert my node into the tree at the correct location.
listOfNodes[indexVector]
gives me the correct node, but it is now a list and not a node I can use addChildren()
on.
Even if I somehow managed to map the indexVector
and the xmlSize()
of all nodes to the correct indices that I could use on xmltop
directly, I would still have the problem of a variable number of double brackets (xmltop[[1]][[3]]
vs xmltop[[1]][[2]][[1]]
).
I have also tried several other functions of the XML
package, including xmlApply
, getNodeLocation
and getNodeSet
, but they did not seem to help.
What I have not really tried
I do not really understand the difference of xmlTreeParse()
, xmlInternalTreeParse()
and xmlTreeParse(useInternalNodes = T)
and I cannot wrap my head around XPath, so I did not get very far trying to use it.
Any helpful pointers would be much appreciated.