At work we are being asked to create XML files to pass data to another offline application that will then create a second XML file to pass back in order to update some of our data. During the process we have been discussing with the team of the other application about the structure of the XML file.
The sample I came up with is essentially something like:
<INVENTORY>
<ITEM serialNumber="something" location="something" barcode="something">
<TYPE modelNumber="something" vendor="something"/>
</ITEM>
</INVENTORY>
The other team said that this was not industry standard and that attributes should only be used for meta data. They suggested:
<INVENTORY>
<ITEM>
<SERIALNUMBER>something</SERIALNUMBER>
<LOCATION>something</LOCATION>
<BARCODE>something</BARCODE>
<TYPE>
<MODELNUMBER>something</MODELNUMBER>
<VENDOR>something</VENDOR>
</TYPE>
</ITEM>
</INVENTORY>
The reason I suggested the first is that the size of the file created is much smaller. There will be roughly 80000 items that will be in the file during transfer. Their suggestion in reality turns out to be three times larger than the one I suggested. I searched for the mysterious "Industry Standard" that was mentioned, but the closest I could find was that XML attributes should only be used for meta data, but said the debate was about what was actually meta data.
After the long winded explanation (sorry) how do you determine what is meta data, and when designing the structure of an XML document how should you decide when to use an attribute or an element?
Use elements for data and attributes for meta data (data about the element's data).
If an element is showing up as a predicate in your select strings, you have a good sign that it should be an attribute. Likewise if an attribute never is used as a predicate, then maybe it is not useful meta data.
Remember that XML is supposed to be machine readable not human readable and for large documents XML compresses very well.
This is very clear in HTML where the differences of attributes and markup can be clearly seen:
If you just have pure data as XML, there is a less clear difference. Data could stand between markup or as attributes.
=> Most data should stand between markup.
If you want to use attributes here: You could divide data into two categories: Data and "meta data", where meta data is not part of the record, you want to present, but things like "format version", "created date", etc.
One could also say: "Use attributes to characterize the tag, use tags to provide data itself."