What is the best way to convert JSON to XML and back. For example, the below JSON
{
"user": "gerry",
"likes": [1, 2, 4],
"followers": [
{
"name": "megan"
},
{
"name": "pupkin"
}
]
}
could be converted into XML like this (#1):
<?xml version="1.0" encoding="UTF-8" ?>
<user>gerry</user>
<likes>1</likes>
<likes>2</likes>
<likes>4</likes>
<followers>
<name>megan</name>
</followers>
<followers>
<name>pupkin</name>
</followers>
or like this (#2):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<likes>
<element>1</element>
<element>2</element>
<element>4</element>
</likes>
<followers>
<element>
<name>megan</name>
</element>
<element>
<name>pupkin</name>
</element>
</followers>
<user>gerry</user>
</root>
In particular, the difference arises converting arrays. Object property conversion is quite trivial. I am also sure that there are other ways to convert JSON to XML.
So the question is: What is the best way? Are there any standards?
Another question: is there a way to express the conversion mapping itself in some mathematical form. Eg, is it possible to describe a mapping such that a conversion function when given the JSON object and the mapping object would know exactly which XML to produce. And reverse it, too.
XML_1 = convert(JSON, mapping_1)
XML_2 = convert(JSON, mapping_2)
JSON = convert(XML_1, mapping_1)
JSON = convert(XML_2, mapping_2)
JSON = convert(XML_1, mapping_2) # Error!
You're obviously interested in the theory behind data serialization. I'll try to explain using the following headings.
- Problem with XML as a data serialization format
- Why other formats are favoured
- It's really about information and relationships
What I'm leading to is an introduction to the Semantic web and how it formats data in various different formats.
Problem with XML as a data serialization format
As you've discovered there a several ways to structure data in XML. This is because XML started life as a documentation markup. XML has no built in way to describe simple data structures like lists or hashes.
Not self describing
Here's a simpe example:
<data>
<user name="gerry"/>
</data>
This can be deserialized as a simple hash:
data.user.name = "gerry"
or less obviously as a list of hashes:
data.user[0].name = "gerry"
Fact is a different XML document could be specifying multiple user tags:
<data>
<user name="gerry"/>
<user name="tom"/>
</data>
XML schema to the rescue
The solution to this problem was to design a separate schema specification that describes how the document is formatted:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="user" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute type="xs:string" name="name" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The person tag is described as being a sequence of elements... So this enables XML parsers to store this information in a list construct.
This is the approach taken by many web service frameworks which process XML data. The message format is described in the WSDL/XML schema and the programming code that processes the message is generated automatically.
Why other formats are favoured
Formats like JSON and YAML are specifically designed to serialize data.
They don't require schema documents in order to parse data unambiguously.
but... Even so.... JSON and YAML don't solve all problems. While the data is more obvious at first glance there are no standards for describing data structures....
Earlier I vilified XML schemas, but these can be really useful to determining whether a piece of data is programmatically usable (valid) or not. Even so an XML Schema does not tell me the relationship between one piece of data and another.
It's really about information and relationships
The Semantic web movement is an attempt to create a self describing and collaborative internet. Problem is (IMHO) the associated standards are complex and difficult to understand and apply. The place to start is RDF:
It's designed as a generic information interchange format and cleverly works in manner that is independent of how data is actually serialized.
Example
Your simple example and expressed as RDF XML:
<?xml version="1.0"?>
<rdf:RDF xmlns:user="http://myspotontheweb.com/user/1.0/" xmlns:ex="http://myspotontheweb.com/example/user/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://myspotontheweb.com/example/user/1">
<user:name>gerry</user:name>
<user:likes>1</user:likes>
<user:likes>2</user:likes>
<user:likes>4</user:likes>
</rdf:Description>
<rdf:Description rdf:about="http://myspotontheweb.com/example/user/2">
<user:name>tom</user:name>
<user:likes>2</user:likes>
<user:likes>4</user:likes>
<user:likes>6</user:likes>
<user:follows rdf:resource="http://myspotontheweb.com/example/user/1" />
</rdf:Description>
<rdf:Description rdf:about="http://myspotontheweb.com/example/user/3">
<user:name>felix</user:name>
<user:likes>3</user:likes>
<user:likes>5</user:likes>
<user:follows rdf:resource="http://myspotontheweb.com/example/user/1" />
</rdf:Description>
</rdf:RDF>
Each item of data has a unique identifier and a custom set of attributes:
- name
- likes
- follows : Used to link one RDF entity to another.
XML is just one way to express RDF, I prefer the more compact N3 RDF format:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix user: <http://myspotontheweb.com/user/1.0/> .
@prefix ex: <http://myspotontheweb.com/example/user/> .
ex:1 user:name "gerry" .
ex:1 user:likes "1" .
ex:1 user:likes "2" .
ex:1 user:likes "4" .
ex:2 user:name "tom" .
ex:2 user:likes "2" .
ex:2 user:likes "4" .
ex:2 user:likes "6" .
ex:2 user:follows ex:1 .
ex:3 user:name "felix" .
ex:3 user:likes "3" .
ex:3 user:likes "5" .
ex:3 user:follows ex:1 .
Again note the custom prefix declaration at the top and the clear statement of what each piece of data ("tuple" in RDF parlance) represents. I think this demonstrates it's about information not data format!
And for completeness the RDF information presented in JSON-LD format:
{
"@graph": [
{
"@id": "http://myspotontheweb.com/example/user/3",
"http://myspotontheweb.com/user/1.0/follows": {
"@id": "http://myspotontheweb.com/example/user/1"
},
"http://myspotontheweb.com/user/1.0/likes": [
"3",
"5"
],
"http://myspotontheweb.com/user/1.0/name": "felix"
},
{
"@id": "http://myspotontheweb.com/example/user/2",
"http://myspotontheweb.com/user/1.0/follows": {
"@id": "http://myspotontheweb.com/example/user/1"
},
"http://myspotontheweb.com/user/1.0/likes": [
"2",
"6",
"4"
],
"http://myspotontheweb.com/user/1.0/name": "tom"
},
{
"@id": "http://myspotontheweb.com/example/user/1",
"http://myspotontheweb.com/user/1.0/likes": [
"2",
"4",
"1"
],
"http://myspotontheweb.com/user/1.0/name": "gerry"
}
]
}
Notes:
- There are multiple ways to express RDF as JSON See as JSON+RDF
Example graph
Once the information is expressed as RDF its relationships to other data entities can be graphed visually:
RDF just the beginning
The Semantic web goes a lot further, it only starts with RDF. There are XML schema-like standards for publishing well understood relationships between tuplies. Using these one can start to manipulate RDF data in very interesting ways.
I don't claim to be an expert in data processing. What I do acknowledge is that some very clever people have been looking at this problem for some time. The concepts are tough to learn, but worthwhile in order to better understand information theory.
You will want to use some variation of these two tools json_decode() and PEAR::XML_Serializer