What is the basic difference between convergence and idempotence in Chef?
相关问题
- Finding a language that is not LL(1)?
- Subversion - Is it possible to disable all commits
- String to string correction problem np-completenes
- how to prove 2 sql statements are equivalent
- Color Remapping - Matching target palette using a
相关文章
- Algorithm for partially filling a polygonal mesh
- How to add String parameter to Jenkins with option
- Proving correctness of multithread algorithms
- Dependency Injection Container
- Thread.Abort and alternatives
- Chef client hanging on npm install at node-gyp reb
- How to change chef nginx default http port 80?
- Names of HTML form naming conventions
Convergence and idempotence are not Chef-specific. They're generally attributed to configuration management theory, though have use in other fields, notably mathematics.
Let's start with the more basic, idempotent. We're going to ignore the mathematic use of idempotent, and focus instead on what configuration management people mean when they talk about it. That is: "multiple applications of the same action do not have side effects on the system state." A simple example of an idempotent operation is
mkdir -p
:No matter how many times we run this command, it will result in that tree being created. Another way of stating this about idempotent operations is, "running the tool over and over doesn't change the system after the first time."
Now to contrast that with convergence. Generically, to converge means to bring [people or] things together. In configuration management, convergence means to bring the system state in line with a defined policy. That is, changes are made on the system only if they need to be made. A simple example of a convergent operation is:
This is convergent because we're only executing the mkdir command if the desired directory does not exist. We also call this a "test and repair" operation. That is, we test the current state of the specific thing we're managing, and then repair it with a specific command or operation if it is not in that state. That is what Chef does behind the scenes with a resource like this:
The way we (Chef) talk about this is that Chef takes idempotent actions to converge the system to the state declared by the various resources. Every resource in Chef is declarative, and performs a test about the current state of the resource, and then repairs the system to match that.
To get deeper into the weeds about how Chef works, it has a "compile" phase and a "converge" phase in a Chef run. In the "compile" phase, it evaluates the Ruby recipes on the node, and it is looking for resource objects that it adds to a "resource collection." Once it has evaluated all the recipes, it then enters the "converge" phase where it iterates over the resource collection, taking the appropriate action to put the resources into the desired state, whereby users are created, files are written, packages are installed, and so forth.
Disclaimer: I'm an outsider in the configuration management community and it took me hours of reading to figure out what follows. I criticise the configuration management community in this answer, so you should be aware that their world is not my world, I don't even use any configuration management tools in my current job, and I'm judging them only on what I can find on Google.
Definitions
To say that an operation is convergent roughly means that it puts whatever part of the system it manages into a specified state.
When configuration management people say that an operation is idempotent, they typically mean that if you run it a second time right after running it once, the second run will terminate immediately without doing any redundant work.
When a resource is described as idempotent in the context of Chef in particular, it means that subsequent Chef runs after the resource has already been put into the desired state don't count it as "updated" in the x/y resources updated message at the end of the run.
Note that most built-in resources satisfy this final, strictest definition of idempotence by default, and you can achieve it in your own recipes and custom resources by using
only_if
andnot_if
guards andconverge_if_changed
.Some commentary, and a note about other definitions
Confusingly, the majority of definitions of "idempotent" you find on the internet will not match either of the ones I've just given. Rather than trusting what the experts say the definition is, I'm inferring it from observing how they actually use the term. It is infuriatingly common to find somebody give a definition of "idempotent" and then use the word in a way that clearly doesn't cohere with that definition a few paragraphs later.
To explore this, let's start by exploring definitions of "idempotent" that exist outside of the field of configuration management. Lots of such definitions are listed on Wikipedia at https://en.wikipedia.org/wiki/Idempotence. The ones that most frequently get (wrongly) given as the meaning of idempotence in a configuration management context are as follows:
A slew of confused sources give one of these definitions as the meaning of "idempotent" in a configuration management context, and then promptly go on to use the term in a way that makes clear that it's not really the definition they're using. Some examples:
Pace's competing answer to this very question. There he claims that:
but then goes on to give this as an example of a step that isn't idempotent:
Clearly this step does in fact meet Pace's definition of idempotence, since running it multiple times in a row gets us to the same final state as running it once (namely, a state where
/var/log/myapp
exists and is empty). However, it does redundant work when run a second time, and so Pace describes it as non-idempotent.Mischa Taylor and Seth Vargo's book Learning Chef: A Guide to Configuration Management and Automation. In there, they claim:
but then later, commenting on one of their example recipes:
Again, they're stating a definition of idempotence based upon the system state being unchanged when a recipe is run multiple times, but actually using the word to mean that unnecessary work is avoided.
Ben Ford's Idempotence: not just a big and scary word on the Puppet blog, where he first gives this definition of idempotence...
Then he gives this example of idempotence, which, while it is consistent with the definition given above, is a little suspicious - because it focuses on subsequent executions not doing redundant work, rather than on them reaching the same result:
Then finally he throws his definition to the wind and gives an example of an operation that he claims is not idempotent despite the fact that repeated executions of it will reach the same result:
But the result will be the same, Ben! Earlier, that's the detail that you told us the definition of idempotence revolved around!
Clearly, Ben is really applying the "avoid redundant work" definition of idempotence, despite what he claims.
Can an operation be idempotent but not convergent?
Yes. Such an operation would be one that doesn't bring the system to a specified end state, but does avoid redundant work on consecutive runs. Pace's answer gives an example of such an operation, and Thinking Like A Chef provides another:
The best practical example I can think of of an operation that you could characterise as idempotent-but-not-convergent would be installing a package using a typical package manager's
install
command that:The state (the version of the package you get) is not determined by the recipe, so it's arguably not convergent, but it successfully avoids unnecessary work.
Can an operation be convergent but not idempotent?
Yes, absolutely! A simple example is Ben Ford's one, already quoted above, of unconditionally downloading a file to some local path. It's convergent, because the end state is always the same (the file exists), but is not idempotent, because it does the unnecessary work of redownloading the file each time it runs.
For what it's worth, I find it frustrating that the configuration management community have appropriated a term that already had a clear meaning in the broader world of programming and then used it in a related but still clearly different way, without ever providing a formal definition of what it means in their world. A search of the Chef docs (https://www.google.co.uk/search?q=site%3Ahttps%3A%2F%2Fdocs.chef.io+idempotent) yields many uses of the term, but no definition. It's not surprising that this topic confuses people when most of the definitions of the term floating around don't match the usage.
I've only managed to find one person who has ever given definitions of idempotence that are consistent with the way the term is used, and that's coderanger (aka Noah Kantrowitz). In Thinking Like A Chef, which I quoted from previously, he writes:
and in an IRC conversation from 2015 he writes:
Other than this one man, I have literally not been able to find anyone else who has ever given a definition of the term that matches how the whole configuration management community seems to use it.
@Mark Amery asked for a more satisfying example of the difference between the two so I will endeavor to provide that.
A step is convergent if, at the successful conclusion of the step, the system has been brought into a known state.
A step is idempotent if, after multiple executions of the step on a system (whose underlying state has not changed), the result is the same as if the step had been executed once.
Convergence without idempotence
A step that is convergent but not idempotent is:
At the successful conclusion of the step we know for a fact that
/var/log/myapp
is an empty directory that exists.It is not idempotent because it blows away the
/var/log/myapp
directory each time. Idempotence is desirable because it reduces unnecessary churn on the system. Obviously any applications writing to the/var/log/myapp
directory would not be happy with the above step.Idempotence without convergence
A step that is idempotent but not convergent is:
That script will create a file with a random name in
/home/foo
only if there are no files in/home/foo
. This is idempotent, after the first run the directory will not be empty so future runs will do nothing.However, it is not convergent. You could not say this step is putting the system into any kind of known state because the file that gets created will have a random name.
Convergence is desired because it helps to produce systems that are in identical states and thus be more likely to behave predictably.
A word of caution
These terms are like abstractions, they are not exact and they can leak. For example, you could state that an operation isn't idempotent because it uses up CPU cycles. You could state that one idempotent test-and-repair operation that does an expensive test is "less idempotent" than another operation that does a cheap test even though "less idempotent" is not a factual thing.
You could try and state that a step that installs MySQL version X is not convergent because, when run on different machines, it leaves a different timestamp on the files. Alternatively you could say the step I posted above IS convergent because it leaves the system in the state "/home/foo exists and contains exactly one file".
This is what happens when math escapes the chalkboard.