Before XML became a standard and given all its sho

2020-05-23 08:19发布

问题:

Yes XML is human readable but so is comma delimited text and properties files.

XML is bloated, hard to parse, hard to modify in code, plus a ton of other problems that I can think about with it.

My questions is what are XML's most attractive qualities that has made it so popular????

回答1:

One of the major advantages it has over things like CSV files is that it can represent hierarchical data easily. To do this you either need a self-describing tree structure like XML, or a pre-defined format such as SWIFT or EDI (and if you've ever dealt with either of those, then you'll realise that XML is trivial to parse in comparison).

One of the reasons it's actually quite easy to parse is because it's 'bloated'. Those end tags mean that you can accurately match the end of elements to the start and work out when the tree has become unbalanced. You can't do that in the 'lightweight' alternatives such as JSON.

Another reason it's easy to parse is because it has had full support for Unicode encodings from the start, so you don't have to worry about what the default code page is on the target system, or how to encode multi-byte characters, because that information is all contained within the document.

And let's not forget about the other artefacts that came with it like the defined description and validation mechanism (XSD) and the powerful and declarative transformation mechanism (XSLT).



回答2:

It was the late 90s and the internet was hot hot hot, but companies had systems that couldn't get anywhere near the internet. They had spent countless hours dealing with CORBA and were plotting using Enterprise JavaBeans to get these older systems communicating with their newer systems.

Along comes SGML, which is the precursor to almost all markup languages (I'm skipping GML). SGML was already used to define how to define HTML, but HTML had particular tags that HAD to be used in order for Netscape to properly display a given webpage.

But what if we had other data that needed to be explained? Ah ha!

So given that XML is structured, and you can feel free to define that structure, it naturally allows you to build interfaces (in a non-OO point of view). It doesn't really do anything that other interface languages already do, but it gave people the ability to design their own definitions.

Interface languages like X12 and HL7 existed for sure, but with XML people could tailor it to their individual AIX or AS/400 systems.

And with the predominance of tag language because of HTML, well it was just natural that XML would get pushed to the forefront because of its ease of use.



回答3:

Straight from the horse's mouth, the design goals of XML were:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

The reason why it became popular was because people needed a standard for a cross-platform data exchange format. XML may be a bit bloated, but it is a very simple way to delimit text data and it was backwards compatible with the large body of existing SGML systems.

You really can't compare XML to CSV because CSV is an extremely limited way of representing data. CSV cannot handle anything outside of a basic row-column table and has no notion of hierarchy.

XML is not that hard to parse and once you write or find a decent XML utility it's not difficult to deal with in code either.



回答4:

XML is not hard to parse, in fact it's quite simple, given the volume of excellent APIs available for every language under the sun.

XML itself is not bloated, it can be as concise as necessary, but it's up to your schema to keep it that way.

XML handles hierarchical datasets in a way that comma-delimited text never could or should.

XML is self-documenting/describing, and human readable. Why is it a standard? Well, first and foremost, because it can be standardized. CSV isn't (and can't be) a standard because there's an infinite amount of variation.



回答5:

It has many advantages, and few shortcomings. The main problem is the increased size of file and slower processing. However, there are advantages:

  • it is structured, so you write a parser only once
  • it supports data with nested structure (hierarchies, trees, etc.)
  • you can embed multiple types of data structure in a single XML
  • you can describe the schema (data types, etc.) with standard language (XSL...)


回答6:

  • You can be given an xml file and have a chance at understanding what the data means by reading it without needing a separate specification of your pre-xml data format.
  • Tools can be used to work with xml generically. Where before, if everybody used different file formats: comma separated, binary, etc. You'd need to write a custom tool.
  • You can extend it, by adding a new tag into the schema with a default value. And if done correctly, with xml that doesn't break all the old code that parses the xml but doesn't know about the tag. That usually isn't true with proprietry formats.
  • Probably the main thing that makes it popular is it looks a bit like HTML, which lots of people understood previously. So it became popular, then because it was popular it became more popular because its nice to work with one standard that everybody knows.
  • A bad thing is that xml is usually a lot bigger because of all the tags and because its text based than used to be used. But, as computers are bigger now, we can often handle that and its worth trading size for having better self-describing data.
  • You can get off the shelf code/libraries that will parse/write xml.


回答7:

How about the fact that it supports a standardized query language, XPath? That's pretty useful for me.



回答8:

XML provides a very straightforward way to represent data. Parsing is fairly easy - it's a very regular grammar and lends itself to straight forward recursive descent parsing. This makes it easy for data consumers and producers to exchange information without really having to know too much about their respective applications and internals.

It is, however, an extremely inefficient way to represent data and lends itself to being abused horribly. An example of this is an object interface I worked with that, instead of exporting constructors and properties for particular objects, required me to author XML programmatically and pass in the resulting XML to the single constructor. Similarly, XML does not lend itself well to large data sets that may require random access without creating an added cataloging system (ie, if I have a thousand page document in XML, I will need to parse nearly the entire file to get to page 999, assuming the page data is ordered), whereas I'd be better off putting the actual page data in a separate file or files and use the XML to point to the correct file or position within a file.



回答9:

Do you remember the days before XML became popular? Data just wasn't easily interchangeable -- one program would take .csv files, the next .xls, the next EBSIDIC-formatted files. XML has its weaknesses, but it's structured, which makes it parsable and transformable.

As you point out, CSV files are pretty portable. However, there's no meaning to them. What does column(14) mean to me? As opposed to <customer id="14"/>?



回答10:

Some inherent qualities of XML that make it so popular and useful:

  1. XML represents a tree, and tree-like structures are a very common pattern in programming. This is an evolutionary leap from record-based representations like CSV, made possible by today's cheap computing power and bandwidth.

  2. XML strikes a good balance between human factors (it is plain text, and fairly legible) and computing practicalities (terseness, ease in parsing, expressiveness, extensibility, etc).



回答11:

Something I haven't seen mentioned yet is that not only is XML structured, but the way that attributes and elements interact creates a somewhat unusual structure that is still easily understandable by humans.

If you compare an XML tree with its nearest structural neighbor, the directed acyclic graph, you might note that the typical DAG carries only an ID and a value at each node. XML carries this as well (gi/tag corresponding with ID, and the text of the node corresponding with the value), but each node then can also carry and arbitrary amount of additional metadata: the elements. This is very much like having an extra dimension — if you consider the DAG as spreading out flat in two dimensions with each branch, the XML document spreads out in three dimensions, flat, and then downwards to a subtree containing just the attributes.

This is an optional bend to the structure. Walk a list of attributes like any list of child elements, and you're back to a two-dimensional tree. Ignore them completely, and you have a simplified node/value tree which may more purely represent the overall "shape" of contained data. But the extra dimension is there if you need the metadata.

With decent indentation, this is something that a human being can pick up just by eyeballing the raw data, making XML a miniature visualization tool for a potentially complex structure — and having a visualization tool built into the data exchange of your application means that the programmers involved are more likely to build a structure that represents the way the data is likely to be used.



回答12:

  1. Schema definition languages - you can describe the expected format of the XML
  2. It's a standard:) - it's definitely better than everybody using their own custom formats

CSV is human readable but that's really the only good thing about it - it's so inflexible, and there are no meanings assigned to the values. If I started designing a system now I would definitely use YAML instead - it's less bloated and it's definitely gaining momentum.



回答13:

It's structured.



回答14:

XML's popularity derives from other markup languages. HTML is the one people are most familiar with, but increasingly now we see "markdown" languages like that used by wikis and even the stackoverflow post form.

HTML did an interesting job, of formatting text, but it was insufficient. It grew. Folks wanted to add tags for everything. <BLINK> anyone? Layouts, styles, and even data.

XML is the extensible markup language (duh, right?), designed so that anyone could create their own tags, and so that your RECORD tag doesn't interfere with my RECORD tag, in case they have different meanings, and with sensitivity to the issues of encoding and tag-matching and escaping that HTML has.

At the start, it was popular with people who already knew HTML, and liked the familiar concept of using markup to organize their data.



回答15:

It's cross platform. We use it to encode robot control program and data running in C under VxWorks for execution, but our off line programming is done under dot net. XML is easily parsed by both.



回答16:

another benefit of XML vs binary data is error resilliancy..

for binary data, if a single bit goes wrong, the data are most likely unusable, with xml, as a last resort, you can still open it up and make corrections...



回答17:

it's compatable with many languages



回答18:

The primary advantage it bestows is a system independent representation of hierarchical data. Comma delimited text and properties files are more appropriate in many places where XML was used, but the ability to represent complex data structures and data types, character set awareness, and standards document allowed it to be used as a good inter application exchange format.

My minor improvement suggestion for the language is to change the way end tags work. Just imagine how much bandwidth and disk space would be saved if you could end a tag with </>, like <my_tag>blah</> instead of <my_tag>blah</my_tag>. You aren't allowed to have overlapping tags, so I don't know why the standard insists on even more text than it needed. In fact, why use angle brackets at all?

The ugliness of the angle brackets is a good show of what it could have been: JSON. JavaScript Object Notation achieves most of the goals of XML with a lot less typing. Another alternate syntax that makes XML bearable is the Builder syntax, as used by Groovy and Ruby. It's much more natural and readable.



回答19:

I'd guess that its popularity orginally stemmed from the fact it solved the right problems in a way that wasn't exceeding bad for enough big players to gain their support and thus gain Widespread industry adoption. At this point, it's rather strongly embedded into the landscape since there's so much component development invested around XML. The HIPPA and other EDI XML schemas and adapters that ship with MS BizTalk Server (and BizTalk itself) are a great example of the mountain that's been gradually built on top of XML.



回答20:

Compared to some of the previous standards it's a dream. Try writing HDF (Hierarchical Data Format) files or FITS. FITS was standardised before the invention of the disc drive - you have to worry about padding the file into block sizes!
Even CSV isn't as simple. Quick question, whats the separator in a German CSV file?

A lot of the complaints about XML are from people who use it to transfer data directly between machines where the data only exists for milliseconds. In a lot of areas the data will have to last for 50-100 years and be far more valuable than the machine it ran on. It's worth paying a closing tag tax sometimes.



回答21:

The two main things that made XML widely adopted are "Human readability" and "Sun Microsystem". They were (and there are still) other cross-language, cross-platform data exchange format that are more flexible, more easy to parse, less verbose than XML. Such as ASN.1.



回答22:

It is a text format that is one of it's major advantages. All binary formats are usually much smaller but you always need tools to "read" them. You can simply open and editor and modify XML files to your liking. However I'd argue it's stil a bloated format, but well you can compress it quite well.... if one looks at the specs for the Windows Office XML formats one just can imagine it's wonderful to be seemingly open....

Regards Friedrich



回答23:

It's easier to write a parser for an XML dialect than for an arbitrary one because of tools that are available.

Using a DOM parser, for example, is much simpler than lexx and yacc, especially in Java where it was popularized.