可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Summary: A better way for aborting ansible playbook immediately if any host is unreachable.
Is there a way to abort Ansible playbook if any one of the host is unreachable. What I find that if it cannot reach a host it will still continue on and execute all the plays/tasks in the playbook.
All my playbooks I specify the max_fail_percentage of 0, but in this case ansible does not complain since all the hosts that are reachable can execute all the plays.
Currently I have a simple but hacky solution, but seeing if there is a better answer.
My current solution:
Since the first step as part of running the playbooks, ansible gathers facts for all the hosts. And in case where a host is not reachable it will not be able to. I write a simple play at the very beginning of my playbook which will use a fact. And in case a host is unreachable that task will fail with "Undefined variable error". The task is just a dummy and will always pass if all hosts are reachable.
See below my example:
- name: Check Ansible connectivity to all hosts
hosts: host_all
user: "{{ remote_user }}"
sudo: "{{ sudo_required }}"
sudo_user: root
connection: ssh # or paramiko
max_fail_percentage: 0
tasks:
- name: check connectivity to hosts (Dummy task)
shell: echo " {{ hostvars[item]['ansible_hostname'] }}"
with_items: groups['host_all']
register: cmd_output
- name: debug ...
debug: var=cmd_output
In case a host is unreachable you will get an error as below:
TASK: [c.. *****************************************************
fatal: [172.22.191.160] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'
fatal: [172.22.191.162] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'
FATAL: all hosts have already failed -- aborting
回答1:
alternatively, this looks simplier and more expressive
- hosts: myservers
become: true
pre_tasks:
- name: Check ALL hosts are reacheable before doing the release
assert:
that:
- ansible_play_hosts == groups.myservers
fail_msg: 1 or more host is UNREACHABLE
success_msg: ALL hosts are REACHABLE, go on
run_once: yes
roles:
- deploy
https://github.com/ansible/ansible/issues/18782#issuecomment-319409529
回答2:
You could be a bit more explicit about the check:
- fail: Abort if hosts are unreachable
when: "'ansible_hostname' not in hostvars[item]"
with_items: groups['all']
I thought you could make a callback plugin to achieve this. Something like:
class CallbackModule(object):
def runner_on_unreachable(self, host, res):
raise Exception("Aborting due to unreachable host " + host)
Except I can't find any good way to abort the entire playbook from that callback (the exception doesn't work, return value is ignored and while you could probably abuse self.playbook
to stop things, there's no public API I can see).
回答3:
You can combine any_errors_fatal: true
or max_fail_percentage: 0
with gather_facts: false
, and then run a task that will fail if the host is offline. Something like this at the top of the playbook should do what you need:
- hosts: all
gather_facts: false
max_fail_percentage: 0
tasks:
- action: ping
A bonus is that this also works with the -l SUBSET
option for limiting matching hosts.
回答4:
I found a way to use a callback to abort the play as soon as the gather_facts has completed.
By setting the _play_hosts to an empty set, there are no hosts to continue with the play.
class CallbackModule(object):
def runner_on_unreachable(self, host, res):
# Truncate the play_hosts to an empty set to fail quickly
self.play._play_hosts = []
The result is something like:
PLAY [test] *******************************************************************
GATHERING FACTS ***************************************************************
fatal: [haderp] => SSH Error: ssh: Could not resolve hostname haderp: nodename nor servname provided, or not known
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
ok: [derp]
TASK: [set a fact] ************************************************************
FATAL: no hosts matched or all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/Users/jkeating/foo.yaml.retry
derp : ok=1 changed=0 unreachable=0 failed=0
haderp : ok=0 changed=0 unreachable=1 failed=0
回答5:
Inspired from other answers.
Using ansible-playbook 2.7.8.
Checking if there are any ansible_facts
for each required hosts feels more explicit to me.
# my-playbook.yml
- hosts: myservers
tasks:
- name: Check ALL hosts are reacheable before doing the release
fail:
msg: >
[REQUIRED] ALL hosts to be reachable, so flagging {{ inventory_hostname }} as failed,
because host {{ item }} has no facts, meaning it is UNREACHABLE.
when: "hostvars[item].ansible_facts|list|length == 0"
with_items: "{{ groups.myservers }}"
- debug:
msg: "Will only run if all hosts are reacheable"
$ ansible-playbook -i my-inventory.yml my-playbook.yml
PLAY [myservers] *************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************
fatal: [my-host-03]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-03: Name or service not known", "unreachable": true}
fatal: [my-host-04]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-04: Name or service not known", "unreachable": true}
ok: [my-host-02]
ok: [my-host-01]
TASK [Check ALL hosts are reacheable before doing the release] ********************************************************************************************************************************************************************************************************************
failed: [my-host-01] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-01] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
skipping: [my-host-01] => (item=my-host-01)
skipping: [my-host-01] => (item=my-host-02)
skipping: [my-host-02] => (item=my-host-01)
skipping: [my-host-02] => (item=my-host-02)
to retry, use: --limit @./my-playbook.retry
PLAY RECAP *********************************************************************************************************************************************************************************************************************
my-host-01 : ok=1 changed=0 unreachable=0 failed=1
my-host-02 : ok=1 changed=0 unreachable=0 failed=1
my-host-03 : ok=0 changed=0 unreachable=1 failed=0
my-host-04 : ok=0 changed=0 unreachable=1 failed=0