Background
I am using xml-flow
npm package to parse XML using streams. Issue is that the xml nodes are getting parsed in an unexpected way.
My intention is to parse a huge XML file using a repeating xml node. The XML file can be any URL and the repeating node will be provided from UI.
I tried to use the options with all possible values but the parsing behaviour doesn't seem to change.
Sample Code
I used following sample XML -
<list>
<item>
<details>
<id>1</id>
</details>
</item>
<item>
<details>
<id>2</id>
<description>description for item 2</description>
</details>
</item>
</list>
I tried to parse it using item
as repeating node as follows -
const fs = require("fs");
const flow = require("xml-flow");
const xmlStream = flow(fs.createReadStream("./sample.xml"));
xmlStream.on('tag:item', function (person) {
console.log(JSON.stringify(person, null, 4));
});
I got following response for 2 parsed xml nodes -
// node 1
{
"$name": "item",
"details": "1"
}
// node 2
{
"$name": "item",
"details": {
"id": "2",
"description": "description for item 2"
}
}
Problem
As you can see in the response, I get a different JSON structure for parsed XML nodes.
In case of first XML node, <id>
node didn't appear in JSON object (unlike second XML node) because its parent node viz. <details>
has only one child node viz. <id>
.
This is causing problems in my application as the parsed XML might have thousands of records & the relative path in JSON structure to the leaf nodes are changing because of this behaviour.
As an example, if there are 10000 records in xml file and all the records after 5000th record have node 2 structure, item.details
relative path will point to a string for records 1 to 5000 whereas the same path will point to an object for remaining records.
Alternative NPM Package
I did try to use xml-stream
which works on the same logic, but it comes with a problem of collecting the sub-items explained here which is even more complicated problem for me as incoming XML structure in this case will vary from file to file.
Let me know if I need to provide more information.
Cheers!
Well! After going through the implementation of these packages, it seems there is no workaround for this problem (I might have missed something) unless explicit support is provided.
I finally decided to write a new logic & ended up writing a new npm package xtreamer which provides xml nodes instead of converting them into JSON objects.
This package exposes a
transform stream
that can be piped with anyreadable stream
. It expects xml node name in request and emits a custom eventxmldata
to output the xml node.The output can the be plugged in to any
xml-json npm package
as per the requirement to get the final JSON. Check the npm package for further details.supporting module
I managed to create one more npm package xtagger which uses
sax npm package
and provides xml structure in following format -This package can be used to find the repeating nodes in xml file by considering their hierarchy.