When writing and debugging Ansible playbooks, typical workflow is as follows:
ansible-playbook ./main.yaml
- Playbook fails on some task
- Fix this task and repeat line 1, waiting for all previous tasks to execute again. Which takes a lot of time
Ideally, i'd like to resume execution on failed task, having inventory and all facts collected by previous tasks. Is it even possible? How to make playbook writing/debugging faster?
Take a look at http://docs.ansible.com/playbooks_startnstep.html.
If you want to start executing your playbook at a particular task, you can do so with the --start-at-task
option:
ansible-playbook playbook.yml --start-at-task="install packages"
The above will start executing your playbook at a task named “install packages”.
Alternatively, take a look at this previous answer How to run only one task in ansible playbook?
Finally, when a play fails, it usually gives you something along the lines of:
PLAY RECAP ********************************************************************
to retry, use: --limit @/home/user/site.retry
Use that --limit
command and it should retry from the failed task.
Future readers:
The --limit @/home/user/site.retry
would not help in such a scenario, the .retry
only stores the failed host and nothing more, so will just execute all tasks against failed hosts.
If you are using the latest version (Ansible 2.x) the --start-at-task
does not work for tasks defined inside roles
.
You can achieve similar effect by just using the --step
flag e.g: ansible-playbook playbook.yml --step
. The step asks you on before executing each task and you could choose (N)o/(y)es/(c)ontinue
.
With this approach you selectively execute tasks when needed and also continue from point where it failed, after fixes.
Future Future readers:
As of Ansible 2.4.2.0 --start-at-task
works for tasks defined in roles I created.
The ansible team is not willing to address this issue they suggest you keep your roles idempotent and replay the entire play, I don't have time for this. In my roles I am not using a massive amount of facts like @JeremyWhiting, so for me I can use this --start-at-task
feature.
Still however, this is a manual task so instead I wrote some ansible rpm and added a "Resume" feature that follows these basic steps:
- Enable the ansible log via /etc/ansible/ansible.cfg (uncomment log_path)
- Clear the log before each run
- After a failure, the "Resume" feature greps this log for the last "TASK" line, and uses sed to get what is inside the "[]"
- Then it simply calls the last run play, with --start-at-task="$start_at_task"
- Ensure that you have "any_errors_fatal: true" in your roles to stop the play at the failing task you wish to resume from
The ansible team is unwilling to create this basic (and very useful) feature so the only choice is to hack it together via some bash scripts.