xml to json mapping challenge

2019-04-07 18:12发布

问题:

At first glance, I thought using xml data in javascript would be as simple as finding an xml-to-json library and turning my xml into a javascript object tree.

Now, however, I'm realizing that it's possible to create structures in xml that don't map directly to json.

Specifically, this:

<parentNode>
    <fooNode>data1</fooNode>
    <barNode>data2</barNode>
    <fooNode>data3</fooNode>
</parentNode>

The xml-to-json tools I've found convert the previous to something like this:

{
parentnode:{
    foonode:[
        'data1',
        'data3'
    ],
    barnode:'data2'
}

}

in which, the order of the child nodes has been changed. I need to preserve the order of my child nodes. Anyone have any solution that's more elegant than

a) abandoning the idea of automatic conversion and just designing my own javascript object structure and writing code to handle this specific xml schema

or

b) abandoning the idea of any conversion at all, and leaving my xml data as an xml document which I'll then traverse.

回答1:

There are established mappings from XML to JSON with limitations (see Converting Between XML and JSON) and mappings from JSON to XML (see JSONx as defined here and conversion rules by IBM). A mapping from XML to JSON that preserves order, however, has not been defined yet. To fully capture all aspects of XML, one should express the XML Infoset in JSON. if you only care about XML elements (no processing instructions, etc.), I'd choose this structure:

[
  "parentNode",
  { } /* attributes */
  [ 
    [ "fooNode", { }, [ "data1" ] ]
    [ "fooNode", { }, [ "data2" ] ]
    [ "fooNode", { }, [ "data3" ] ]
  ]
]

I implemented the same mapping as mapping between XML and Perl data structures that are just like JSON with XML::Struct. The structure further corresponds to the abstract data model of MicroXML, a simplified subset of XML.



回答2:

If you need the same element name often and you care about ordering it might be better to stay with XML. What benefits do you expect from using JSON?



回答3:

Why not try:

{ parentNode: [
  ["fooNode", "data1"],
  ["barNode", "data2"],
  ["fooNode", "data3"] ]
}

I think it would more or less solve the problem.

And yes, I think you should abandon automatic conversion if it's not flexible enough; instead you might look for an API that makes such mappings trivial.



回答4:

I devised this, recently:

(just a thought experiment)

var someTinyInfosetSample = {
  "doctype": "html",
  "$": [
    { "": "html" },
    [ { "": "head" },
       [ { "": "title" }, "Document title" ]
    ],
    [ { "": "body" },
      [ { "": "h1" }, "Header 1" ],
      [ { "": "p", "class": "content" },
        "Paragraph... (line 1)", [ { "": "br" } ],
        "... continued (line 2)"
      ]
    ]
  ] };

(at https://jsfiddle.net/YSharpLanguage/dzq4fe39)

Quick rationale:

XML elements are the only node type (besides the document root) which accepts mixed content (text nodes and/or other elements, comments, PIs, and defines an order of its child nodes; hence the use of JSON arrays (child indices being then 1-based, instead of 0-based, because of the reserved index 0 to carry the node type (element) info; but one can note that XPath nodesets also use a 1-based index, btw);

XML attribute name/value maps don't need any ordering of the keys (attribute names) wrt. their owner element, only uniqueness of those at that element node; hence the use of a JSON object at index 0 of the container array (corresp. to the owner element);

and finally, after all, while "" is a perfectly valid JSON key in object values, it's also the case that neither XML elements or attributes can have an empty name anyway... hence the use of "" as a special, conventional key, to provide the element name.

And here's what it takes to turn it into HTML using my small "JSLT" (at https://jsfiddle.net/YSharpLanguage/c7usrpsL/10):

var tinyInfosetJSLT = { $: [
  [ [ function/*Root*/(node) { return node.$; } ],
      function(root) { return Per(this).map(root.$); }
  ],
  [ [ function/*Element*/(node) { return { }.toString.call(node) === "[object Array]"; } ],
      function(element) {
        var children = (element.length > 1 ? element.slice(1) : null),
            startTag = element[0],
            nodeName = startTag[""],
            self = this;
        return children ?
               Per("\r\n<{stag}>{content}</{etag}>\r\n").map
               ({
                 stag: Per(this).map(startTag),
                 etag: nodeName,
                 content: Per(children).map(function(child) { return Per(self).map(child); }).join("")
               })
               :
               Per("<{stag}/>").map({ stag: Per(this).map(startTag) });
      }
  ],
  [ [ function/*StartTag*/(node) { return node[""]; } ],
      function(startTag) {
        var tag = [ startTag[""] ];
        for (var attribute in startTag) {
          if (attribute !== "") {
            tag.push
            (
              Per("{name}=\"{value}\"").
              map({ name: attribute, value: startTag[attribute].replace('"', "&quot;") })
            );
          }
        }
        return tag.join(" ");
      }
  ],
  [ [ function/*Text*/(node) { return typeof node === "string"; } ],
      function(text) {
        return text.
               replace("\t", "&x09;").
               replace("\n", "&x0A;").
               replace("\r", "&x0D;");
      }
  ]
] };

(Cf. https://jsfiddle.net/YSharpLanguage/dzq4fe39/1)

where,

Per(tinyInfosetJSLT).map(someTinyInfosetSample)

yields (as a string):

<html>
<head>
<title>Document title</title>
</head>

<body>
<h1>Header 1</h1>

<p class="content">Paragraph... (line 1)<br/>... continued (line 2)</p>
</body>
</html>

(but above the transform could also be easily adapted to use a DOM node factory, and build an actual DOM document, instead of building a string)

'HTH,