At work we are being asked to create XML files to pass data to another offline application that will then create a second XML file to pass back in order to update some of our data. During the process we have been discussing with the team of the other application about the structure of the XML file.
The sample I came up with is essentially something like:
<INVENTORY>
<ITEM serialNumber="something" location="something" barcode="something">
<TYPE modelNumber="something" vendor="something"/>
</ITEM>
</INVENTORY>
The other team said that this was not industry standard and that attributes should only be used for meta data. They suggested:
<INVENTORY>
<ITEM>
<SERIALNUMBER>something</SERIALNUMBER>
<LOCATION>something</LOCATION>
<BARCODE>something</BARCODE>
<TYPE>
<MODELNUMBER>something</MODELNUMBER>
<VENDOR>something</VENDOR>
</TYPE>
</ITEM>
</INVENTORY>
The reason I suggested the first is that the size of the file created is much smaller. There will be roughly 80000 items that will be in the file during transfer. Their suggestion in reality turns out to be three times larger than the one I suggested. I searched for the mysterious "Industry Standard" that was mentioned, but the closest I could find was that XML attributes should only be used for meta data, but said the debate was about what was actually meta data.
After the long winded explanation (sorry) how do you determine what is meta data, and when designing the structure of an XML document how should you decide when to use an attribute or an element?
I use the following guidelines in my schema design with regards to attributes vs. elements:
The preference for attributes is it provides the following:
I added when technically possible because there are times where the use of attributes are not possible. For example, attribute set choices. For example use (startDate and endDate) xor (startTS and endTS) is not possible with the current schema language
If XML Schema starts allowing the "all" content model to be restricted or extended then I would probably drop it
Just a couple of corrections to some bad info:
@John Ballinger: Attributies can contain any character data. < > & " ' need to be escaped to < > & " and ' , respectively. If you use an XML library, it will take care of that for you.
Hell, an attribute can contain binary data such as an image, if you really want, just by base64-encoding it and making it a data: URL.
@feenster: Attributes can contain space-separated multiple items in the case of IDS or NAMES, which would include numbers. Nitpicky, but this can end up saving space.
Using attributes can keep XML competitive with JSON. See Fat Markup: Trimming the Fat Markup Myth one calorie at a time.
Both methods for storing object's properties are perfectly valid. You should depart from pragmatic considerations. Try answering following question:
Does readability matter?
...
It's largely a matter of preference. I use Elements for grouping and attributes for data where possible as I see this as more compact than the alternative.
For example I prefer.....
...Instead of....
However if I have data which does not represent easily inside of say 20-30 characters or contains many quotes or other characters that need escaping then I'd say it's time to break out the elements... possibly with CData blocks.
Attributes can easily become difficult to manage over time trust me. i always stay away from them personally. Elements are far more explicit and readable/usable by both parsers and users.
Only time i've ever used them was to define the file extension of an asset url:
i guess if you know 100% the attribute will not need to be expanded you could use them, but how many times do you know that.
When in doubt, KISS -- why mix attributes and elements when you don't have a clear reason to use attributes. If you later decide to define an XSD, that will end up being cleaner as well. Then if you even later decide to generate a class structure from your XSD, that will be simpler as well.