I have a bunch of servers that need will be need frequent patching. I am planning on using Ansible to coordinate the patching process. The keep point here is that it must be an "all or nothing" patching. Either all servers are patched or none.
The tasks I was considering for my playbook would be something like: 1 - Go to all servers and take an lvm snapshot 2 - IIF task 1 works on all servers, apply the changes 3 - If one of the hosts fails for any reason, roll back the snapshot on ALL NODES.
The problem is that I am new to Ansible and I can't express this on a playbook. I have written this simple testing playbook:
---
- hosts: all
strategy: linear
tasks:
- block:
- debug: msg='Testing on {{ inventory_hostname }}...'
- command: /home/amirsamary/activity.sh
changed_when: false
rescue:
- debug: msg='Rollback of {{ inventory_hostname }}...'
- debug: msg='I continued running tasks on {{ inventory_hostname }}...'
I have two hosts on my inventory. On the first node, activity.sh returns true and on the second node, activity.sh returns false. So, node2 will always fail. The problem is that the rescue tasks will only run for the failed host and not for all of them (as one would expect anyway) and the playbook keeps running the other tasks.
I have heard a lot about how good Ansible was to orchestrate complex tasks on thousands of servers. But I can't seem to find a way of safely implement an "all or nothing strategy" with it. What am I missing?
I bet there are many ways to implement this, here is one of them:
What's done here?
cmd_result
and ignore errors, if anycommand
task completed on all hosts before next task being executedcmd_result
registered for every hostcmd_result
facts for all hosts in the current play, select those withfailed
defined, convert them to list and count them: if there is any, rollback.So rollback task will be executed for all hosts if there is failed
cmd_result
for any of them.You may want to add this task after rollback task:
This way you will have your rollback tasks done and also mark problem hosts as failed.