How to reboot CentOS 7 with Ansible?

2019-02-02 04:01发布

问题:

I'm trying to reboot server running CentOS 7 on VirtualBox. I use this task:

- name: Restart server
  command: /sbin/reboot
  async: 0
  poll: 0
  ignore_errors: true

Server is rebooted, but I get this error:

TASK: [common | Restart server] ***********************************************
fatal: [rolcabox] => SSH Error: Shared connection to 127.0.0.1 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

What am I doing wrong? How can I fix this?

回答1:

You're likely not doing anything truly wrong, it's just that /sbin/reboot is shutting down the server so quickly that the server is tearing down the SSH connection used by Ansible before Ansible itself can close it. As a result Ansible is reporting an error because it sees the SSH connection failing for an unexpected reason.

What you might want to do to get around this is to switch from using /sbin/reboot to using /sbin/shutdown instead. The shutdown command lets you pass a time, and when combined with the -r switch it will perform a reboot rather than actually shutting down. So you might want to try a task like this:

- name: Restart server
  command: /sbin/shutdown -r +1
  async: 0
  poll: 0
  ignore_errors: true

This will delay the server reboot for 1 minute, but in doing so it should give Ansible enough time to to close the SSH connection itself, thereby avoiding the error that you're currently getting.



回答2:

After the reboot task, you should have a local_action task that waits for the remote host to finish rebooting, otherwise, the ssh connection will be terminated and so is the playbook.


- name: Reboot server
  command: /sbin/reboot

- name: Wait for the server to finish rebooting
  sudo: no
  local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300

I also wrote a blog post about achieving a similar solution: https://oguya.github.io/linux/2015/02/22/ansible-reboot-servers/



回答3:

- name: restart server
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true


- name: waiting for the server to come back
  local_action: wait_for host=testcentos state=started delay=30 timeout=300
  sudo: false


回答4:

Another solution:

- name: reboot host
  command: /usr/bin/systemd-run --on-active=10 /usr/bin/systemctl reboot
  async: 0
  poll: 0

- name: wait for host sshd
  local_action: wait_for host="{{ inventory_hostname }}" search_regex=OpenSSH port=22 timeout=300 delay=30

systemd-run creates "on the fly" new service which will start systemctl reboot after 10 sec of delay (--on-active=10). delay=30 in wait_for to add extra 20 sec to be sure that host actually started rebooting.



回答5:

None of the above solutions worked reliably for me.

Issuing a /sbin/reboot crashes the play (the SSH connection is closed before ansible finished the task, it crashes even with ignore_errors: true) and /usr/bin/systemd-run --on-active=2 /usr/bin/systemctl reboot will not reboot after 2 seconds, but after a random amount of time between 20 seconds and one minute, so the delay is sometime not sufficient and this is not predictable.

Also I don't want to wait for minutes while a cloud server can reboot in few seconds.

So here is my solution:

- name: Reboot the server for kernel update
  shell: ( sleep 3 && /sbin/reboot & )
  async: 0
  poll: 0 

- name: Wait for the server to reboot
  local_action: wait_for host="{{ansible_host}}" delay=15 state=started port="{{ansible_port}}" connect_timeout=10 timeout=180

That's the shell: ( sleep 3 && /sbin/reboot & ) line that does the trick.

Using ( command & ) in shell script runs a program in the background and detaches it: the command succeed immediately but persists after the shell is destroyed.

Ansible get its response immediately and the server reboots 3 seconds later.



回答6:

Ansible is developing quickly and the older answers were not working for me.

I found two issues:

  • The recommended way of rebooting may kill the SSH connection before Ansible finishes the task.

It is better to run: nohup bash -c "sleep 2s && shutdown -r now" &

This will launch a shell with the sleep && shutdown, but will not wait for the shell to end due to the last &. The sleep will give some time for the Ansible task to end before the reboot and the nohup will guarantee that bash doesn't get killed when the task ends.

  • The wait_for module is not reliably waiting for the SSH service.

It detects the port open, probably open by systemd, but when the next task is run, SSH is still not ready.

If you're using Ansible 2.3+, wait_for_connection works reliably.

The best 'reboot and wait' in my experience (I am using Ansible 2.4) is the following:

- name: Reboot the machine
  shell: nohup bash -c "sleep 2s && shutdown -r now" &

- name: Wait for machine to come back
  wait_for_connection:
    timeout: 240
    delay: 20

I've got the nohup command from: https://github.com/keithchambers/microservices-playground/blob/master/playbooks/upgrade-packages.yml

I edited this message to:

  • add krad's portability suggestion, using shutdown -r now instead of reboot
  • add a delay. It is needed to avoid Ansible to execute the next step if the reboot is slow
  • increase the timeout, 120s was too little for some slow BIOS.


回答7:

Yet another (combined from other answers) version:

---
- name: restart server
  command: /usr/bin/systemd-run --on-active=5 --timer-property=AccuracySec=100ms /usr/bin/systemctl reboot
  async: 0
  poll: 0
  ignore_errors: true
  become: yes

- name: wait for server {{ ansible_ssh_host | default(inventory_hostname) }} to come back online
  wait_for:
    port: 22
    state: started
    host: '{{ ansible_ssh_host | default(inventory_hostname) }}'
    delay: 30
  delegate_to: localhost


回答8:

At reboot time all ssh connections are closed. That's why the Ansible task fails. The ignore_errors: true or failed_when: false additions are no longer working as of Ansible 1.9.x because handling of ssh connections has changed and a closed connection now is a fatal error which can not be caught during play.

The only way I figured out how to do it is to run a local shell task which then starts a separate ssh connection, which then may fail.

- name: Rebooting
  delegate_to: localhost
  shell: ssh -S "none" {{ inventory_hostname }} sudo /usr/sbin/reboot"
  failed_when: false
  changed_when: true


回答9:

I am using Ansible 2.5.3. Below code works with ease,

- name: Rebooting host
  shell: 'shutdown -r +1 "Reboot triggered by Ansible"'

- wait_for_connection:
    delay: 90
    timeout: 300

You can reboot immediately, then insert a delay if your machine takes a while to go down:

    - name: Rebooting host
      shell: 'shutdown -r now "Reboot triggered by Ansible"'
      async: 1
      poll: 1
      ignore_errors: true

# Wait 120 seconds to make sure the machine won't connect immediately in the next section.
    - name: Delay for the host to go down
      local_action: shell /bin/sleep 120

Then poll to make the playbook return as soon as possible:

    - name: Wait for the server to finish rebooting
      wait_for_connection:
        delay: 15
        sleep: 15
        timeout: 300

This will make the playbook return as soon as possible after the reboot.



回答10:

Following solution works for me perfect:

- name: Restart machine
  shell: "sleep 5 && sudo shutdown -r now"
  async: 1
  poll: 0

- name: wait for ssh again available.
  wait_for_connection:
    connect_timeout: 20
    sleep: 5
    delay: 5
    timeout: 300

Sleep is required because ansible requires few second's to wrap up connection. Excelent post about this problem was written here: https://www.jeffgeerling.com/blog/2018/reboot-and-wait-reboot-complete-ansible-playbook



回答11:

if you're using Ansible version >=2.7, you can use reboot module as described here

The synopsis of the reboot module itself:

Reboot a machine, wait for it to go down, come back up, and respond to commands.

In a simple way, you can define a simple task like this:

    - name: reboot server
      reboot:

But you can add some params like test_command to test if your server is ready to take further tasks

    - name: reboot server
      reboot:
        test_command: whoami

Hope this helps!