I have two XML files of similar structure which I wish to merge into one file. Currently I am using EL4J XML Merge which I came across in this tutorial. However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4. Instead it just discards either 1 and 2 or 3 and 4 depending on which file is merged first.
So I would be grateful to anyone who has experience with XML Merge if they could tell me what I might be doing wrong or alternatively does anyone know of a good XML API for Java that would be capable of merging the files as I require?
Many Thanks for Your Help in Advance
Edit:
Could really do with some good suggestions on doing this so added a bounty. I've tried jdigital's suggestion but still having issues with XML merge.
Below is a sample of the type of structure of XML files that I am trying to merge.
<run xmloutputversion="1.02">
<info type="a" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="up" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="1">
<state value="test" />
<service value="gamma" />
</result>
<result id="2">
<state value="test4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
<run xmloutputversion="1.02">
<info type="b" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="down" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="3">
<state value="testagain" />
<service value="gamma2" />
</result>
<result id="4">
<state value="testagain4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
Expected output
<run xmloutputversion="1.02">
<info type="a" />
<debugging level="0" />
<host starttime="1237144741" endtime="1237144751">
<status state="down" reason="somereason"/>
<status state="up" reason="somereason"/>
<something avalue="test" test="alpha" />
<target>
<system name="computer" />
</target>
<results>
<result id="1">
<state value="test" />
<service value="gamma" />
</result>
<result id="2">
<state value="test4" />
<service value="gamma4" />
</result>
<result id="3">
<state value="testagain" />
<service value="gamma2" />
</result>
<result id="4">
<state value="testagain4" />
<service value="gamma4" />
</result>
</results>
<times something="0" />
</host>
<runstats>
<finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
<result total="0" />
</runstats>
</run>
You can try Dom4J which provides a very good means to extract information using XPath Queries and also allows you to write XML very easily. You just need to play around with the API for a while to do your job
Have you considered just not bothering with parsing the XML "properly" and just treating the files as big long strings and using boring old things such as hash maps and regular expressions...? This could be one of those cases where the fancy acronyms with X in them just make the job fiddlier than it needs to be.
Obviously this does depend a bit on how much data you actually need to parse out while doing the merge. But by the sound of things, the answer to that is not much.
Not very elegant, but you could do this with the DOM parser and XPath:
This assumes that you can hold at least two of the documents in RAM simultaneously.
Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.
So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure. From this I used XStream to unserialize the XML file back into classes.
This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.
If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.
I use XSLT to merge XML files. It allows me to adjust the merge operation to just slam the content together or to merge at an specific level. It is a little more work (and XSLT syntax is kind of special) but super flexible. A few things you need here
a) Include an additional file b) Copy the original file 1:1 c) Design your merge point with or without duplication avoidance
a) In the beginning I have
this allows to point to the second file using $mDoc
b) The instructions to copy a source tree 1:1 are 2 templates:
With nothing else you get a 1:1 copy of your first source file. Works with any type of XML. The merging part is file specific. Let's presume you have event elements with an event ID attribute. You do not want duplicate IDs. The template would look like this:
Of course you can compare other things like tag names etc. Also it is up to you how deep the merge happens. If you don't have a key to compare, the construct becomes easier e.g. for log:
To run XSLT in Java use this:
or you download the Saxon SAX Parser and do it from the command line (Linux shell example):
YMMV
I took a look at the referenced link; it's odd that XMLMerge would not work as expected. Your example seems straightforward. Did you read the section entitled Using XPath declarations with XmlMerge? Using the example, try to set up an XPath for results and set it to merge. If I'm reading the doc correctly, it would look something like this: