XML for configuration files, why? [closed]

2019-01-30 18:24发布

问题:

Why do so many projects use XML for configuration files?

回答1:

This is an important question.

Most alternatives (JSON, YAML, INI files) are easier to parse than XML.

Also, in languages like Python -- where everything is source -- it's easier to simply put your configuration in a clearly-labeled Python module.

Yet, some people will say that XML has some advantage over JSON or Python.

What's important about XML is that the "universality" of XML syntax doesn't really apply much when writing a configuration file that's specific to an application. Since portability of a configuration file doesn't matter, some Python folks write their configuration files in Python.


Edit

Security of a configuration file does not matter. The "configuring a Python program in Python is a security risk" argument seems to ignore the fact that Python is already installed and running as source. Why work up a complex hack in a configuration file when you have the source? Just hack the source.

I've heard folks say that "someone" could hack your app via the configuration file. Who's this "someone"? The sysadmin? The DBA? The developer? There aren't a lot of mysterious "someone"s with access to the configuration files.

And anyone who could hack up the Python configuration file for nefarious purposes could probably install keyloggers, fake certificates or other more serious threats.



回答2:

  1. XML is easy to parse. There are several popular, lightweight, featureful, and/or free XML parsing libraries avaliable in most languages.
  2. XML is easy to read. It is a very human-readable markup language, so it's easy for humans to write as well as for computers to write.
  3. XML is well specified. Everyone and his dog knows how to write decent XML, so there's no confusion about the syntax.
  4. XML is popular. Somewhere along the way, some Important People™ started pushing the idea that XML was the "future", and a lot of people bought it.
  5. XML is a bidirectional format. That is whitespace, comments, and order are preserved. You can programmatically load, change and then save it while preserving the formatting. This is important for tools that users can use to configure their applications. It is one of the reasons XML originally took off (the world has become more technical so this is less of a need).
  6. XML has optional schema validation. Important for tools and complex configuration formats.
  7. XML has namespaces. This allows other configurations or annotations to be embedded with out effecting the parsing. In other configuration formats this is usually done as a with hack special comments or property name mangling.

As a side note, I'm not trying to defend XML. It has its uses, and I will be using it in a project whenever I get back to that. In many cases, though, and especially configuration files, the only advantage it has is that it's a standardized format, and I think this is far outweighed by numerous disadvantages (i.e. it's too verbose). However, my personal preferences don't matter - I was merely answering why some people might choose to use XML as a configuration file format. I personally never will.



回答3:

Because XML sounds cool and enterprisey.

Edit: I didn't realize my answer was so vague, until a commenter requested the definition of enterprisey. Citing Wikipedia:

[...] the term "enterprisey" is intended to go beyond the concern of "overkill for smaller organizations", to imply the software is overly complex even for large organizations and simpler, proven solutions are available.

My point is that XML is a buzzword and as such is being overused. Despite other opinions, XML is not easy to parse (just look at libxml2, its gzipped source package is currently over 3MB). Due to the amount of redundancy it is also annoying to write by hand. For example, Wikipedia lists XML configuration as one of the reasons for the decrease of the popularity of jabberd in favor of other implementations.



回答4:

XML is a well developed and adopted standard, making it easier to read and understand than proprietary configuration formats.

Also, it's worth understanding that XML serialization is a common tool available in most languages that makes saving object data extremely easy for developers. Why build your own way of saving a hierarchy of complex data when someone else has already done the work for you?

.NET: http://msdn.microsoft.com/en-us/library/system.xml.serialization.aspx

PHP: http://us.php.net/serialize

Python: http://docs.python.org/library/pickle.html

Java: http://java.sun.com/developer/technicalArticles/Programming/serialization/



回答5:

Thanks for your answers. This question, as naive as it may seem at first glance was not so naive :)

Personally I don't like XML for configuration files, I think it's hard for people to read and change, and it's hard for computers to parse because it's so generic and powerful.

INI files or Java propery files are fine for only the most basic applications that does require nesting. common solutions to add nesting to those formats look like:

level1.key1=value
level1.key2=value
level2.key1=value

not a pretty sight, a lot of redundancy and hard to move things between nodes.

JSON is not a bad language, but it's designed to be easy for computers to parse (it's valid JavaScript), so it's not wildly used for configuration files.

JSON looks like this:

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

In my opinion, it's too cluttered with commas and quotes.

YAML is good for configuration files, here is a sample:

invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars

however, I don't like its syntax too much, and I think that using the whitespace to define scopes make things a bit fragile (think pasting a block to a different nesting level).

A few days ago I started to write my own language for configuration file, I dubbed it Swush.

Here are a few sample: as a simple key-value pairs:

key:value
key:value2
key1:value3

or as a more complex and commented

server{
    connector{
         protocol : http // HTTP or BlahTP
         port : 8080     # server port
         host : localhost /* server host name*/
    }

    log{
        output{
             file : /var/log/server.log
             format : %t%s
        }
    }
}

Swush supports strings in the simple form above, or in quotes - which allows whitespaces and even newlines inside strings. I am going to add arrays soon, somethings like:

name [1 2 b c "Delta force"]

There is a Java implementation, but more implementations are welcome. :). check the site for more information (I covered most of it, but the Java API provide a few interesting features like selectors)



回答6:

One other point, if you have an XSD (schema file) to describe your configuration file, it is trivial for your application to validate the configuration file.



回答7:

Because parsing XML is relatively easy, and if your schema is clearly specified, any utility can read and write information easily into it.



回答8:

Well.., XML is a general-purpose specification that can hold descriptions, nested information and data about something. And there are many APIs and softwares that can parse it and read it.

So it's much easy to describe something in formal way that is known cross platforms and applications.



回答9:

Here are some historical reasons:

  • The W3C moved from building tools in Perl to Java
  • The Apache foundation moved from building tools in Perl to Java
  • Java has lots of XML APIs
  • Configuration can therefore be done in Java
  • Configuration via XML and properties files is for non-Java developers

JTidy configuration vs tidy configuration is a prime example of this.



回答10:

Its because XML allows you to basically make your own semantic markup, which can be read by a parser built in virtually any language. An added benefit is that the configuration file written in XML can be used on projects where you are using two or more languages. IF you were to make a configuration file where everything was defined as variables for a specific language, it would only work in that language, obviously.



回答11:

The main advantage of XML and the reason why is so popular is because it's popular in java world and therefore all of the enterprise applications written in java use it, and also because web services and soap are based on xml and those are used a lot in enterprise applications.

And so far, JSON and all other formats aren't so well supported by the industry, except in ajax applications. Also, JSON does not have an schema language or an defined parsing api like XML.

Even if roughly speaking, JSON doesn't need the tons of stuff xml has, at least not in the same way, and I'm speaking in web services, when I say that...



回答12:

One reason which was not specified in other answers is Unicode / text encoding / you name it. Need a chinese string in the file? No problem. This might sound trivial, but when XML was introduced it wasn't. Obviously not in INI files.

Another thing - it was the first thing that gave us possibility to have structured data with lists, dictionaries or whatever you want, which is machine-processable and human editable at the same time.

It has disadvantages, but what else could you use? Yaml looks great, but I'm afraid to introduce it in projects I work on because I just see in my imagination all those problems with people putting a white space in the wrong place, or merging tools not caring about them.