At work we are being asked to create XML files to pass data to another offline application that will then create a second XML file to pass back in order to update some of our data. During the process we have been discussing with the team of the other application about the structure of the XML file.
The sample I came up with is essentially something like:
<INVENTORY>
<ITEM serialNumber="something" location="something" barcode="something">
<TYPE modelNumber="something" vendor="something"/>
</ITEM>
</INVENTORY>
The other team said that this was not industry standard and that attributes should only be used for meta data. They suggested:
<INVENTORY>
<ITEM>
<SERIALNUMBER>something</SERIALNUMBER>
<LOCATION>something</LOCATION>
<BARCODE>something</BARCODE>
<TYPE>
<MODELNUMBER>something</MODELNUMBER>
<VENDOR>something</VENDOR>
</TYPE>
</ITEM>
</INVENTORY>
The reason I suggested the first is that the size of the file created is much smaller. There will be roughly 80000 items that will be in the file during transfer. Their suggestion in reality turns out to be three times larger than the one I suggested. I searched for the mysterious "Industry Standard" that was mentioned, but the closest I could find was that XML attributes should only be used for meta data, but said the debate was about what was actually meta data.
After the long winded explanation (sorry) how do you determine what is meta data, and when designing the structure of an XML document how should you decide when to use an attribute or an element?
Some of the problems with attributes are:
If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data.
Don't end up like this (this is not how XML should be used):
Source: http://www.w3schools.com/xml/xml_dtd_el_vs_attr.asp
I use this rule of thumb:
So yours is close. I would have done something like:
EDIT: Updated the original example based on feedback below.
It may depend on your usage. XML that is used to represent stuctured data generated from a database may work well with ultimately field values being placed as attributes.
However XML used as a message transport would often be better using more elements.
For example lets say we had this XML as proposed in the answer:-
Now we want to send the ITEM element to a device to print he barcode however there is a choice of encoding types. How do we represent the encoding type required? Suddenly we realise, somewhat belatedly, that the barcode wasn't a single automic value but rather it may be qualified with the encoding required when printed.
The point is unless you building some kind of XSD or DTD along with a namespace to fix the structure in stone, you may be best served leaving your options open.
IMO XML is at its most useful when it can be flexed without breaking existing code using it.
It is arguable either way, but your colleagues are right in the sense that the XML should be used for "markup" or meta-data around the actual data. For your part, you are right in that it's sometimes hard to decide where the line between meta-data and data is when modeling your domain in XML. In practice, what I do is pretend that anything in the markup is hidden, and only the data outside the markup is readable. Does the document make some sense in that way?
XML is notoriously bulky. For transport and storage, compression is highly recommended if you can afford the processing power. XML compresses well, sometimes phenomenally well, because of its repetitiveness. I've had large files compress to less than 5% of their original size.
Another point to bolster your position is that while the other team is arguing about style (in that most XML tools will handle an all-attribute document just as easily as an all-#PCDATA document) you are arguing practicalities. While style can't be totally ignored, technical merits should carry more weight.
the million dollar question!
first off, don't worry too much about performance now. you will be amazed at how quickly an optimized xml parser will rip through your xml. more importantly, what is your design for the future: as the XML evolves, how will you maintain loose coupling and interoperability?
more concretely, you can make the content model of an element more complex but it's harder to extend an attribute.
How about taking advantage of our hard earned object orientation intuition? I usually find it is straight forward to think which is an object and which is an attribute of the object or which object it is referring to.
Whichever intuitively make sense as objects shall fit in as elements. Its attributes (or properties) would be attributes for these elements in xml or child element with attribute.
I think for simpler cases like in the example object orientation analogy works okay to figure out which is element and which is attribute of an element.