可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to reboot server running CentOS 7
on VirtualBox. I use this task:
- name: Restart server
command: /sbin/reboot
async: 0
poll: 0
ignore_errors: true
Server is rebooted, but I get this error:
TASK: [common | Restart server] ***********************************************
fatal: [rolcabox] => SSH Error: Shared connection to 127.0.0.1 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
FATAL: all hosts have already failed -- aborting
What am I doing wrong? How can I fix this?
回答1:
You're likely not doing anything truly wrong, it's just that /sbin/reboot is shutting down the server so quickly that the server is tearing down the SSH connection used by Ansible before Ansible itself can close it. As a result Ansible is reporting an error because it sees the SSH connection failing for an unexpected reason.
What you might want to do to get around this is to switch from using /sbin/reboot
to using /sbin/shutdown
instead. The shutdown command lets you pass a time, and when combined with the -r
switch it will perform a reboot rather than actually shutting down. So you might want to try a task like this:
- name: Restart server
command: /sbin/shutdown -r +1
async: 0
poll: 0
ignore_errors: true
This will delay the server reboot for 1 minute, but in doing so it should give Ansible enough time to to close the SSH connection itself, thereby avoiding the error that you're currently getting.
回答2:
After the reboot task, you should have a local_action
task that waits for the remote host to finish rebooting, otherwise, the ssh connection will be terminated and so is the playbook.
- name: Reboot server
command: /sbin/reboot
- name: Wait for the server to finish rebooting
sudo: no
local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300
I also wrote a blog post about achieving a similar solution: https://oguya.github.io/linux/2015/02/22/ansible-reboot-servers/
回答3:
- name: restart server
shell: sleep 2 && shutdown -r now "Ansible updates triggered"
async: 1
poll: 0
become: true
ignore_errors: true
- name: waiting for the server to come back
local_action: wait_for host=testcentos state=started delay=30 timeout=300
sudo: false
回答4:
Another solution:
- name: reboot host
command: /usr/bin/systemd-run --on-active=10 /usr/bin/systemctl reboot
async: 0
poll: 0
- name: wait for host sshd
local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300 delay=30
systemd-run
creates "on the fly" new service which will start systemctl reboot
after 10 sec of delay (--on-active=10
).
delay=30
in wait_for
to add extra 20 sec to be sure that host actually started rebooting.
回答5:
None of the above solutions worked reliably for me.
Issuing a /sbin/reboot
crashes the play (the SSH connection is closed before ansible finished the task, it crashes even with ignore_errors: true
) and /usr/bin/systemd-run --on-active=2 /usr/bin/systemctl reboot
will not reboot after 2 seconds, but after a random amount of time between 20 seconds and one minute, so the delay is sometime not sufficient and this is not predictable.
Also I don't want to wait for minutes while a cloud server can reboot in few seconds.
So here is my solution:
- name: Reboot the server for kernel update
shell: ( sleep 3 && /sbin/reboot & )
async: 0
poll: 0
- name: Wait for the server to reboot
local_action: wait_for host="{{ansible_host}}" delay=15 state=started port="{{ansible_port}}" connect_timeout=10 timeout=180
That's the shell: ( sleep 3 && /sbin/reboot & )
line that does the trick.
Using ( command & )
in shell script runs a program in the background and detaches it: the command succeed immediately but persists after the shell is destroyed.
Ansible get its response immediately and the server reboots 3 seconds later.
回答6:
Ansible is developing quickly and the older answers were not working for me.
I found two issues:
- The recommended way of rebooting may kill the SSH connection before Ansible finishes the task.
It is better to run: nohup bash -c "sleep 2s && shutdown -r now" &
This will launch a shell with the sleep
&& shutdown
, but will not wait for the shell to end due to the last &
. The sleep will give some time for the Ansible task to end before the reboot and the nohup
will guarantee that bash doesn't get killed when the task ends.
- The
wait_for
module is not reliably waiting for the SSH service.
It detects the port open, probably open by systemd, but when the next task is run, SSH is still not ready.
If you're using Ansible 2.3+, wait_for_connection works reliably.
The best 'reboot and wait' in my experience (I am using Ansible 2.4) is the following:
- name: Reboot the machine
shell: nohup bash -c "sleep 2s && shutdown -r now" &
- name: Wait for machine to come back
wait_for_connection:
timeout: 240
delay: 20
I've got the nohup command from: https://github.com/keithchambers/microservices-playground/blob/master/playbooks/upgrade-packages.yml
I edited this message to:
- add krad's portability suggestion, using shutdown -r now instead of reboot
- add a delay. It is needed to avoid Ansible to execute the next step if the reboot is slow
- increase the timeout, 120s was too little for some slow BIOS.
回答7:
Yet another (combined from other answers) version:
---
- name: restart server
command: /usr/bin/systemd-run --on-active=5 --timer-property=AccuracySec=100ms /usr/bin/systemctl reboot
async: 0
poll: 0
ignore_errors: true
become: yes
- name: wait for server {{ ansible_ssh_host | default(inventory_hostname) }} to come back online
wait_for:
port: 22
state: started
host: '{{ ansible_ssh_host | default(inventory_hostname) }}'
delay: 30
delegate_to: localhost
回答8:
At reboot time all ssh connections are closed. That's why the Ansible task fails. The ignore_errors: true
or failed_when: false
additions are no longer working as of Ansible 1.9.x because handling of ssh connections has changed and a closed connection now is a fatal error which can not be caught during play.
The only way I figured out how to do it is to run a local shell task which then starts a separate ssh connection, which then may fail.
- name: Rebooting
delegate_to: localhost
shell: ssh -S "none" {{ inventory_hostname }} sudo /usr/sbin/reboot"
failed_when: false
changed_when: true
回答9:
I am using Ansible 2.5.3.
Below code works with ease,
- name: Rebooting host
shell: 'shutdown -r +1 "Reboot triggered by Ansible"'
- wait_for_connection:
delay: 90
timeout: 300
You can reboot immediately, then insert a delay if your machine takes a while to go down:
- name: Rebooting host
shell: 'shutdown -r now "Reboot triggered by Ansible"'
async: 1
poll: 1
ignore_errors: true
# Wait 120 seconds to make sure the machine won't connect immediately in the next section.
- name: Delay for the host to go down
local_action: shell /bin/sleep 120
Then poll to make the playbook return as soon as possible:
- name: Wait for the server to finish rebooting
wait_for_connection:
delay: 15
sleep: 15
timeout: 300
This will make the playbook return as soon as possible after the reboot.
回答10:
Following solution works for me perfect:
- name: Restart machine
shell: "sleep 5 && sudo shutdown -r now"
async: 1
poll: 0
- name: wait for ssh again available.
wait_for_connection:
connect_timeout: 20
sleep: 5
delay: 5
timeout: 300
Sleep is required because ansible requires few second's to wrap up connection.
Excelent post about this problem was written here:
https://www.jeffgeerling.com/blog/2018/reboot-and-wait-reboot-complete-ansible-playbook
回答11:
if you're using Ansible version >=2.7, you can use reboot
module as described here
The synopsis of the reboot
module itself:
Reboot a machine, wait for it to go down, come back up, and respond to commands.
In a simple way, you can define a simple task like this:
- name: reboot server
reboot:
But you can add some params like test_command
to test if your server is ready to take further tasks
- name: reboot server
reboot:
test_command: whoami
Hope this helps!