Yes XML is human readable but so is comma delimited text and properties files.
XML is bloated, hard to parse, hard to modify in code, plus a ton of other problems that I can think about with it.
My questions is what are XML's most attractive qualities that has made it so popular????
XML is not hard to parse, in fact it's quite simple, given the volume of excellent APIs available for every language under the sun.
XML itself is not bloated, it can be as concise as necessary, but it's up to your schema to keep it that way.
XML handles hierarchical datasets in a way that comma-delimited text never could or should.
XML is self-documenting/describing, and human readable. Why is it a standard? Well, first and foremost, because it can be standardized. CSV isn't (and can't be) a standard because there's an infinite amount of variation.
It has many advantages, and few shortcomings. The main problem is the increased size of file and slower processing. However, there are advantages:
XML's popularity derives from other markup languages. HTML is the one people are most familiar with, but increasingly now we see "markdown" languages like that used by wikis and even the stackoverflow post form.
HTML did an interesting job, of formatting text, but it was insufficient. It grew. Folks wanted to add tags for everything. <BLINK> anyone? Layouts, styles, and even data.
XML is the extensible markup language (duh, right?), designed so that anyone could create their own tags, and so that your RECORD tag doesn't interfere with my RECORD tag, in case they have different meanings, and with sensitivity to the issues of encoding and tag-matching and escaping that HTML has.
At the start, it was popular with people who already knew HTML, and liked the familiar concept of using markup to organize their data.
Straight from the horse's mouth, the design goals of XML were:
The reason why it became popular was because people needed a standard for a cross-platform data exchange format. XML may be a bit bloated, but it is a very simple way to delimit text data and it was backwards compatible with the large body of existing SGML systems.
You really can't compare XML to CSV because CSV is an extremely limited way of representing data. CSV cannot handle anything outside of a basic row-column table and has no notion of hierarchy.
XML is not that hard to parse and once you write or find a decent XML utility it's not difficult to deal with in code either.
It was the late 90s and the internet was hot hot hot, but companies had systems that couldn't get anywhere near the internet. They had spent countless hours dealing with CORBA and were plotting using Enterprise JavaBeans to get these older systems communicating with their newer systems.
Along comes SGML, which is the precursor to almost all markup languages (I'm skipping GML). SGML was already used to define how to define HTML, but HTML had particular tags that HAD to be used in order for Netscape to properly display a given webpage.
But what if we had other data that needed to be explained? Ah ha!
So given that XML is structured, and you can feel free to define that structure, it naturally allows you to build interfaces (in a non-OO point of view). It doesn't really do anything that other interface languages already do, but it gave people the ability to design their own definitions.
Interface languages like X12 and HL7 existed for sure, but with XML people could tailor it to their individual AIX or AS/400 systems.
And with the predominance of tag language because of HTML, well it was just natural that XML would get pushed to the forefront because of its ease of use.
The primary advantage it bestows is a system independent representation of hierarchical data. Comma delimited text and properties files are more appropriate in many places where XML was used, but the ability to represent complex data structures and data types, character set awareness, and standards document allowed it to be used as a good inter application exchange format.
My minor improvement suggestion for the language is to change the way end tags work. Just imagine how much bandwidth and disk space would be saved if you could end a tag with
</>
, like<my_tag>blah</>
instead of <my_tag>blah</my_tag>
. You aren't allowed to have overlapping tags, so I don't know why the standard insists on even more text than it needed. In fact, why use angle brackets at all?The ugliness of the angle brackets is a good show of what it could have been: JSON. JavaScript Object Notation achieves most of the goals of XML with a lot less typing. Another alternate syntax that makes XML bearable is the Builder syntax, as used by Groovy and Ruby. It's much more natural and readable.