Data
I have an xml file with a structure like this (large example to show the needed flexibility):
<rootnode sth="something" descr="ex">
<tag sth="sth1" descr="ex" anoAttr="sth2">
<tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
<tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
<tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
<someContent/>
</tag>
<someContent/>
</tag>
<tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
<someContent/>
</tag>
<tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
<someContent/>
</tag>
</tag>
<tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
<someContent/>
</tag>
<tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
<tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
<tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
<someContent/>
</tag>
<someContent/>
</tag>
</tag>
</tag>
<otherNode>
<someNode/>
</otherNode>
</rootnode>
Specifically, the size of any of the tag
nodes is unknown, the number of attributes is not equal for all tag
nodes and the values of the attributes are not unique.
What I do know, however, is that the value of the searchA
attribute is unique. Also, only tag
nodes can contain an attribute called searchA
and all of them except the top level one do.
Before
I first parse this document using the XML
package with the function xmlTreeParse()
and store the root node. I then create a new node using newXMLNode()
.
xmlfile = xmlTreeParse(filename, useInternalNodes = TRUE)
xmltop = xmlRoot(xmlfile)
newNode = newXMLNode(name = "newlyCreatedNode")
Goal
My goal is to insert my newly created newNode
as a child of the node that has a certain value (for example "sth23"
) as the searchA
attribute.
So in this case I want the result to look like this (notice the <newlyCreatedNode/>
near the bottom):
<rootnode sth="something" descr="ex">
<tag sth="sth1" descr="ex" anoAttr="sth2">
<tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
<tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
<tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
<someContent/>
</tag>
<someContent/>
</tag>
<tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
<someContent/>
</tag>
<tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
<someContent/>
</tag>
</tag>
<tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
<someContent/>
</tag>
<tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
<tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
<tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
<someContent/>
</tag>
<someContent/>
<newlyCreatedNode/>
</tag>
</tag>
</tag>
<otherNode>
<someNode/>
</otherNode>
</rootnode>
Basically, in this case addChildren(xmltop[[1]][[3]][[1]], kids = list(newNode))
gets me the result that I want. Of course I do not want to specify [[1]][[3]][[1]]
.
What I tried
I can get a list of all relevant nodes with xmlElementsByTagName()
and get all attributes with xmlAttrs()
. I can even get a logical index vector which gives me the correct location.
listOfNodes = xmlElementsByTagName(el = xmltop, "tag", recursive = T)
attributeList = lapply(listOfNodes, FUN = function(x) xmlAttrs(x))
indexVector = sapply(attributeList, FUN = function(x) x["searchA"] == "sth23")
indexVector[is.na(indexVector)] = FALSE
listOfNodes[indexVector]
What I do not know is how to use this information to insert my node into the tree at the correct location.
listOfNodes[indexVector]
gives me the correct node, but it is now a list and not a node I can use addChildren()
on.
Even if I somehow managed to map the indexVector
and the xmlSize()
of all nodes to the correct indices that I could use on xmltop
directly, I would still have the problem of a variable number of double brackets (xmltop[[1]][[3]]
vs xmltop[[1]][[2]][[1]]
).
I have also tried several other functions of the XML
package, including xmlApply
, getNodeLocation
and getNodeSet
, but they did not seem to help.
What I have not really tried
I do not really understand the difference of xmlTreeParse()
, xmlInternalTreeParse()
and xmlTreeParse(useInternalNodes = T)
and I cannot wrap my head around XPath, so I did not get very far trying to use it.
Any helpful pointers would be much appreciated.
The reason for my confusion was the help page for
?xmlElementsByTagName
. It says there:This made me think that the function returns a list of copies instead of references to the nodes themselves.
This might possibly be the case if the xml was parsed with the flag
useInternalNodes
of thexmlTreeParse()
function set toFALSE
, but if it is set toTRUE
when parsing, the list returned byxmlElementsByTagName()
seems to contain the actual references.These can easily be manipulated using for example
addChildren()
.In short, the very simple solution to my problem is: