At work we are being asked to create XML files to pass data to another offline application that will then create a second XML file to pass back in order to update some of our data. During the process we have been discussing with the team of the other application about the structure of the XML file.
The sample I came up with is essentially something like:
<INVENTORY>
<ITEM serialNumber="something" location="something" barcode="something">
<TYPE modelNumber="something" vendor="something"/>
</ITEM>
</INVENTORY>
The other team said that this was not industry standard and that attributes should only be used for meta data. They suggested:
<INVENTORY>
<ITEM>
<SERIALNUMBER>something</SERIALNUMBER>
<LOCATION>something</LOCATION>
<BARCODE>something</BARCODE>
<TYPE>
<MODELNUMBER>something</MODELNUMBER>
<VENDOR>something</VENDOR>
</TYPE>
</ITEM>
</INVENTORY>
The reason I suggested the first is that the size of the file created is much smaller. There will be roughly 80000 items that will be in the file during transfer. Their suggestion in reality turns out to be three times larger than the one I suggested. I searched for the mysterious "Industry Standard" that was mentioned, but the closest I could find was that XML attributes should only be used for meta data, but said the debate was about what was actually meta data.
After the long winded explanation (sorry) how do you determine what is meta data, and when designing the structure of an XML document how should you decide when to use an attribute or an element?
XML Element vs XML Attribute
XML is all about agreement. First defer to any existing XML schemas or established conventions within your community or industry.
If you are truly in a situation to define your schema from the ground up, here are some general considerations that should inform the element vs attribute decision:
I agree with feenster. Stay away from attributes if you can. Elements are evolution friendly and more interoperable between web service toolkits. You'd never find these toolkits serializing your request/response messages using attributes. This also makes sense since our messages are data (not metadata) for a web service toolkit.
I am always surprised by the results of these kinds of discussions. To me there is a very simple rule for deciding whether data belongs in an attribute or as content and that is whether the data has navigable sub-structure.
So for example, non-markup text always belongs in attributes. Always.
Lists belong in sub-structure or content. Text which may over time include embedded structured sub-content belong in content. (In my experience there is relatively little of this - text with markup - when using XML for data storage or exchange.)
XML schema written this way is concise.
Whenever I see cases like
<car><make>Ford</make><color>Red</color></car>
, I think to myself "gee did the author think that there were going to be sub-elements within the make element?"<car make="Ford" color="Red" />
is significantly more readable, there's no question about how whitespace would be handled etc.Given just but the whitespace handling rules, I believe this was the clear intent of the XML designers.
There is no universal answer to this question (I was heavily involved in the creation of the W3C spec). XML can be used for many purposes - text-like documents, data and declarative code are three of the most common. I also use it a lot as a data model. There are aspects of these applications where attributes are more common and others where child elements are more natural. There are also features of various tools that make it easier or harder to use them.
XHTML is one area where attributes have a natural use (e.g. in class='foo'). Attributes have no order and this may make it easier for some people to develop tools. OTOH attributes are harder to type without a schema. I also find namespaced attributes (foo:bar="zork") are often harder to manage in various toolsets. But have a look at some of the W3C languages to see the mixture that is common. SVG, XSLT, XSD, MathML are some examples of well-known languages and all have a rich supply of attributes and elements. Some languages even allow more-than-one-way to do it, e.g.
or
Note that these are NOT equivalent syntactically and require explicit support in processing tools)
My advice would be to have a look at common practice in the area closest to your application and also consider what toolsets you may wish to apply.
Finally make sure that you differentiate namespaces from attributes. Some XML systems (e.g. Linq) represent namespaces as attributes in the API. IMO this is ugly and potentially confusing.
"XML" stands for "eXtensible Markup Language". A markup language implies that the data is text, marked up with metadata about structure or formatting.
XHTML is an example of XML used the way it was intended:
Here, the distinction between elements and attributes is clear. Text elements are displayed in the browser, and attributes are instructions about how to display them (although there are a few tags that don't work that way).
Confusion arises when XML is used not as a markup language, but as a data serialization language, in which the distinction between "data" and "metadata" is more vague. So the choice between elements and attributes is more-or-less arbitrary except for things that can't be represented with attributes (see feenster's answer).
Others have covered how to differentiate between attributes from elements but from a more general perspective putting everything in attributes because it makes the resulting XML smaller is wrong.
XML is not designed to be compact but to be portable and human readable. If you want to decrease the size of the data in transit then use something else (such as google's protocol buffers).