I was wondering if there are any design patterns, guidelines or documented widsom/best practices for creating 'application configuration' structure, data, and files.
I realize this questions has been partially touched in some posts but I hope the following questions prompt looking at the topic from another aspect.
- Basically what kind of analysis goes into creating the configuration structure?
- What kind of forces are at play that one needs to consider?
- When does application configuration analysis/creation come into play; is it an after-thought or followup to the main design activity (is dictated by application design) or interdependent with the main design and a structuring/architecture effort in its own right?
- What pros and cons for structuring configuration data in one way rather than another.
- What kind of requirements needs to be captured or aware of (flexibility, override capability, lack of duplication, selection, ...)
- What is the cost paid in developing bad application configurations?
Specifically my interest is in developing the hierarchy of configuration settings.
Are there any actual projects out there with sufficient level of sophistication whose configuration could be studied?
My question is not aimed at format or type of files (whether to use flat ini, or json, xml, ...) but at how to arrive at configurations in the first place.
Thanks
for sure configuration should be read in runtime (during application startup) rather than during application build/compilation phase. your binaries should be always the same, no matter where they are running.
during startup application should receive the correct configuration (from filesystem, environment variable, configuration server etc). other possibility is to receive (from the same source) information about which configuration should be used and then load correct configuration already bundled in your package. the second option is less flexible but may be easier to start with.
it's extremely useful to be able to override specific pieces of configuration in development environment (e.g. via command line). sometimes you can use it also to quickly debug production problem. but in general i would avoid inheritance and overriding of pieces configuration because it quickly becomes hard to manage and to see whole set of available properties. i advise to split configuration into 2 parts. one - common on all environments (constants, paths, company's mail server etc) and second - things that varies between different runtimes (url, db password etc)
and you have to keep in mind testing. your components has to be properly initialized during integration tests. think about it upfront or later you will have to fight with your framework
The basic point that I got from Ciaran McHale's answer is that, if eventual users of a package might have multiple deployments, the package should come with an interface to efficiently generate the different configuration files. Your question is what principles should guide the underlying structure of the configuration files. Even if we create an interface, this question is still relevant because it will help the design and the implementation of the interface. One key point that is not subjective is that a configuration process is an adaptation of a generic code to its environment and we can always separate the environment in two parts : the digital or server realm and the non digital or application realm. So, it is a general and non subjective fact that configuration values can be separated in three categories : those that depend only on the server, those that depends only on the application and those that depend on both. I think it is important to separate these three categories in different files, especially if they are managed directly, without an interface. In the same way, each file should be partitioned in sections that correspond to the different parts of the environment with which the package interacts : the database, the email server, etc. If modules need to be configured, they are like part of the environment for the core - the same principle applies. My observation is that some large systems will even use different files for each section. For example, Apache has different conf files for the different sites and different modules. Another point that comes to my mind is that there is no reason why functions, as first-class objects, should not be used as configuration values : if a function needs to be defined differently in different servers or different applications, then it is a configuration value. I cannot think of any other general principles.
Many applications are deployed (that is, installed, configured and run) multiple times. For example, an application might be deployed on the developer's machine; then on one or more testing machines (with a name such as UAT, Staging or pre-production); then possibly on multiple production machines. The multiple deployments in production might arise for a variety of reasons, for example:
- You want to deploy multiple replicas of a server application, and then use a load-balancing technique to route client requests to those server replicas.
- You want to deploy a separate instance of your application in each geographical location where your company has an office.
- Your application has a plug-in architecture, and you want to deploy several instances of the application, each with a different set of plug-ins.
You will probably need to have a "mostly similar but slightly different" set of configuration files for each deployment of your application. That can result in a maintenance problem if you use a copy-and-paste approach to creating those multiple sets of configuration files. I know of two approaches to reducing that problem.
The first approach is to use Config4* for your configuration-file syntax (disclaimer: I am the main maintainer of Config4*), because it's syntax provides a variety of ways, such as "if-then-else" and "include" statements to enable a configuration file to adapt to its deployment environment while still reusing common name=value settings. I suggest you read Chapters 2 (HTML) and 3 (HTML) of the Config4* Getting Started Guide (PDF) to get an overview of its syntax and API. Unfortunately, Murphy's Law states that your application will make use of some third-party libraries that use something other than Config4* (e.g., XML or Java properties files) for their configuration, so Config4* won't help you with that. However, I think it is still worthwhile reading the above mentioned documentation, since some of Config4*'s abilities might provide food for thought.
The second approach is to write a "template" for each kind of configuration file, where the template contains mostly plain text, but with a few placeholders. Here is an example template configuration file, using the notation ${foo}
to denote a placeholder.
serverName = "${serverName}"
listenPort = "${serverPort}"
logDir = "/data/logs/${serverName}";
idleTimeout = "5 minutes";
workingDir = "/tmp";
If you do that for all the configuration files used by your application, then you will probably find that performing a global-search-and-replace on the template configuration files with values for a relatively small number of placeholders will yield the ready-to-run configuration files for a particular deployment. I have used this approach in a project where there was over 2000 lines of configuration files per deployment, and about 60 deployments, thus resulting in over 100,000 lines of configuration files. The search-and-replace data to be applied to the template files had about 50 times fewer lines than the ready-to-deploy configuration files that were produced from it, and thus were significantly easier to maintain. If you are looking for an easy way to perform global search-and-replace on placeholders in template files, then you might want to consider Apache Velocity.
Independent of whether you hand-write configuration files or generate them from templates is the issue of run-time validation of the data in the configuration files. Unfortunately, relatively few configuration-file formats provide schema validation. The only ones I know of are as follows. Config4* provides an easy-to-use schema validation engine (discussed in Chapter 3 of the documentation mentioned above). There is also a schema validation language for JSON. And, of course, there are numerous schema validation languages for XML, but they typically have a steep learning curve.