fabric: Reboot broken on Ubuntu 16.04 hosts

The built in reboot() function, which has been working perfectly both on Ubuntu 14.04 and FreeBSD 10.x hosts, but is broken on Ubuntu 16.04 hosts.

What is happening on Ubuntu 14.04: I receive an output like this and the system reboots, after the reboot Fabric reconnects.

[ubuntu] out:
[ubuntu] out:
[ubuntu] out: Broadcast message from root@ubuntu
[ubuntu] out:
[ubuntu] out:   (/dev/pts/0) at 15:02 ...
[ubuntu] out:
[ubuntu] out:
[ubuntu] out:
[ubuntu] out:
[ubuntu] out: The system is going down for reboot NOW!
[ubuntu] out:
[ubuntu] out:

What is happening on Ubuntu 16.04:

  1. There is no output at all from the command.
  2. The system actually starts rebooting (still no output in Fabric)
  3. The system finishes reboot, but Fabric doesn’t realise it, it does not reconnect, still no output.
  4. Fabric just sits there waiting seemingly forever.

If I press the enter key in this state, Fabric actually continues, but shows this message before:

No handlers could be found for logger "paramiko.transport"
Warning: sudo() received nonzero return code -1 while executing 'reboot'!

I am using this code for reboot:

def reboot_():
    with settings(warn_only=True):
        print 'rebooting'
        start_time = time.time()
        reboot(wait=1200)
        print 'reboot took: {} seconds'.format(time.time() - start_time)

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 4
  • Comments: 21 (2 by maintainers)

Most upvoted comments

The ubuntu bug https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1645002 is marked as fixed in 16.10, but not yet in 16.04, and unclear when it will be.

The current behavior for me is that paramiko/fabric instantly detect that the ssh connection was closed, but it’s before paramiko/fabric sees the reboot command to have completed. At least it doesn’t hang indefinitely as in the original report.

Fatal error: sudo() received nonzero return code -1 while executing!
...
Aborting.

Plain reboot() did that consistently for me in a handful of tests against AWS EC2 and a local virtualbox VM. (I always used keyfile auth.)

I’ve found a short and elegant workaround, as I suggested without as much detail above:

reboot(command="shutdown -r +0")

That worked as expected for me (in my handful of tests against AWS EC2 and local virtualbox VM, all running up-to-date ubuntu 16.04). Note that “shutdown -r now” behaved like “reboot” and did not seem to work.

I took a quick look at the freebsd and openbsd man pages, and it looks they have a shutdown command that supports those parameters. I suspect that the command “shutdown -r +0” would work for pretty much any unix system which “reboot” worked on. So it could be considered for changing the default command, or updating the documentation. (But I’d be interested to see a report of a test on a BSD system first.)

The problem seems to be that “reboot” is now sometimes “too fast”, before the status of the command gets back over the ssh connection.

(Tip: If you’re at a frozen ssh connection as a result: type \n~. aka enter, tilde, period. That’s the default ssh escape character, then the disconnect command for ssh. If you just try ctrl-c or ctrl-d, ssh tries to pass that to the process running on the other side.)

One solution is to use shutdown -r +1, which will schedule the reboot for the next minute, and then wait a minute for it to start, and then start trying to re-connect. Admittedly, waiting a minute is not great.

A hacky thing to try: shutdown -r +0 should be equivalent to reboot, but in my limited tests of Ubuntu-16.04 running in VirtualBox, it tends to give a fraction of a second longer, showing the next shell prompt just before disconnecting a manual ssh session.

I’ve just run into broken reboot (fresh up-to-date Ubuntu 16.04 on AWS, Fabric==1.12.0) but in a different way. For me it just throws:

Fatal error: sudo() received nonzero return code -1 while executing!

Requested: reboot
Executed: sudo -S -p 'sudo password:'  /bin/bash -l -c "reboot"

Running sudo reboot in terminal by hand works (host reboots).

FYI, I still see this against 18.04.2 LTS servers.

Apologies to revive an old issue, I can also confirm that this problem happens when attempting to reboot a LXC container. @ploxiln’s suggestion of using command="shutdown -r +0" did work for us.

Indeed, you need to replace the existing connection, the way reboot() does:

https://github.com/fabric/fabric/blob/1.13.2/fabric/operations.py#L1289-L1294