mitogen: Ansible raw fails randomly

Quack,

I’m using master commit 876a82f and Ansible 2.5.5.

This task:

    - name: Install Python
      raw: test -e /usr/bin/python || (apt -qqy update && apt install -qqy python-minimal)
      register: output
      changed_when: output.stdout != ""

sometimes, randomly, fails with:

 11280 1529065165.21643: _execute() done
 11280 1529065165.21658: dumping result to json
 11280 1529065165.21672: done dumping result, returning
 11280 1529065165.21751: done running TaskExecutor() for Jinta/TASK: Install Python [5c514f2f-680e-8e01-2ae9-00000000000d]
 11280 1529065165.21782: sending task result for task 5c514f2f-680e-8e01-2ae9-00000000000d
 11280 1529065165.21838: done sending task result for task 5c514f2f-680e-8e01-2ae9-00000000000d
 11280 1529065165.21851: WORKER PROCESS EXITING
fatal: [Jinta]: UNREACHABLE! => {"changed": false, "msg": "Connection timed out.", "unreachable": true}

Regards. \_o<

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 18 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Note the raw module currently requires Python to be installed. Fixing that is basically blocked on #419 to avoid any more spaghetti code.

But that doesn’t seem to explain your situation. What kind of machine is it connecting to and how many machines are in a typical run?

We had some super scary bugs fixed over the past 6 months – including several where FDs could be closed at random, this might explain it. At one point bootstrap could fail if the machine was low on RAM /and/ user was SSHing into an unprivileged container.

Another avenue is a difference in how Mitogen interprets ‘connected’ compared to Ansible – it requires everything to happen up to and including the remote interpreter saying hello before it is marked connected. Is there some chance that the ‘python’ command is failing on that machine, and somehow the SSH connection is otherwise being held open? For example, a crazy bashrc that backgrounds some process might do this, as could an SSH config with certain ControlMaster settings, where exitting cannot complete because other ControlMaster clients are using the connection

I’m going to have a play with raw specifically, but just in case you can tickle it again, this strace wrapper technique is very effective at revealing startup problems that aren’t so easy to capture in a log.

I had to let go of Mitogen at some point but I’m still interested. I certainly won’t bother you if you prefer to close this ticket but I think in this case the documentation should make this limitation clear (with the workaround).

That said I’m still using the snippet above, updated for Python 3. I remember that with CentOS/RHEL having a system-python separate from the usual python package I wanted to make sure I got the latter. Also if you have no access to VM preseeding (to install the package at this stage), that can be useful; you’re not always in charge of all the pieces of infra.