salt: "Minion did not return" executing state with long running command, 2016.3 regression

This is a setup I’ve been using for about a year, on a variety of 2015 versions, with no issue. Upgrading the minion to 2016.3.1 makes it stop working. I believe it’s long-running commands, but maybe something else specific to what I’m doing.

Take the following state:

gzcat /root/image-latest.img.gz | dd of=/dev/null bs=1M:
  cmd.run

Where image-latest.img.gz is a gzipped disk image file, which in normal operation is sent to dd with output to a hard drive, but for sake of example, /dev/null behaves in the same way. Any sufficiently large gz file (100 MB would suffice) will show the end result problem.

When executing that state from the master via salt my-deploy state.sls write-disk, the master claims the minion didn’t return.

[DEBUG   ] Reading configuration from /usr/local/etc/salt/master
[DEBUG   ] Using cached minion ID from /usr/local/etc/salt/minion_id: dev-salt-master.example.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Configuration file path: /usr/local/etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /usr/local/etc/salt/master
[DEBUG   ] Using cached minion ID from /usr/local/etc/salt/minion_id: dev-salt-master.example.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/66/c9f0cb3dfb31a4e0eae44ad32a108d/.minions.p
[DEBUG   ] get_iter_returns for jid 20160722001528110729 sent to set(['my-deploy']) will timeout at 00:15:33.119337
[DEBUG   ] Checking whether jid 20160722001528110729 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] LazyLoaded no_return.output
my-deploy:
    Minion did not return. [Not connected]

But if you watch the minion’s log, it’s running the command, and it finishes successfully. So the minion works, it just apparently doesn’t reply back to the master, leaving it thinking it disconnected.

Minion log when master reports “Minion did not return”.

[INFO    ] Running state [gzcat /root/image-latest.img.gz | dd of=/dev/null bs=1M] at time 00:15:37.880888
[INFO    ] Executing state cmd.run for gzcat /root/image-latest.img.gz | dd of=/dev/null bs=1M
[INFO    ] Executing command 'gzcat /root/image-latest.img.gz | dd of=/dev/null bs=1M' in directory '/root'
[INFO    ] {'pid': 2344, 'retcode': 0, 'stderr': '0+131072 records in\n8191+1 records out\n8589888000 bytes transferred in 33.619007 secs (255506891 bytes/sec)', 'stdout': ''}
[INFO    ] Completed state [gzcat /root/image-latest.img.gz | dd of=/dev/null bs=1M] at time 00:16:11.539727 duration_in_ms=33658.839
[INFO    ] Returning information for job: 20160722001528110729

Master log from 2015.8.1 which works correctly.

[DEBUG   ] Reading configuration from /usr/local/etc/salt/master
[DEBUG   ] Using cached minion ID from /usr/local/etc/salt/minion_id: dev-salt-master.example.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Configuration file path: /usr/local/etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /usr/local/etc/salt/master
[DEBUG   ] Using cached minion ID from /usr/local/etc/salt/minion_id: dev-salt-master.example.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/ac/a63465b4ec61ac13c59cb3545fa507/.minions.p
[DEBUG   ] get_iter_returns for jid 20160722042210217054 sent to set(['my-deploy']) will timeout at 04:22:15.225950
[DEBUG   ] Checking whether jid 20160722042210217054 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] Checking whether jid 20160722042210217054 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] Checking whether jid 20160722042210217054 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] Checking whether jid 20160722042210217054 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/usr/local/etc/salt/pki/master', 'dev-salt-master.example.com_master', 'tcp://[::]:4506', 'clear')
[DEBUG   ] jid 20160722042210217054 return from my-deploy
[DEBUG   ] LazyLoaded highstate.output
[DEBUG   ] LazyLoaded nested.output

Versions Report

Not working:

Salt Version:
           Salt: 2016.3.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.3
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.7
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.11 (default, Mar 17 2016, 01:24:00)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.3
            ZMQ: 4.1.4

System Versions:
           dist:
        machine: amd64
        release: 10.2-RELEASE-p14
         system: FreeBSD
        version: Not Installed

Working:

    Salt Version:
               Salt: 2015.8.1

    Dependency Versions:
             Jinja2: 2.8
           M2Crypto: Not Installed
               Mako: Not Installed
             PyYAML: 3.11
              PyZMQ: 14.7.0
             Python: 2.7.10 (default, Nov  5 2015, 01:15:39)
               RAET: Not Installed
            Tornado: 4.3
                ZMQ: 4.1.3
               cffi: Not Installed
           cherrypy: Not Installed
           dateutil: 2.3
              gitdb: Not Installed
          gitpython: Not Installed
              ioflo: Not Installed
            libnacl: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.4.2
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
             pygit2: Not Installed
       python-gnupg: Not Installed
              smmap: Not Installed
            timelib: Not Installed

    System Versions:
               dist:
            machine: amd64
            release: 10.1-RELEASE-p24

The OS used doesn’t matter either way, FreeBSD 10.2 with any 2015 salt version works fine.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 42 (26 by maintainers)

Most upvoted comments

Greetings. I think I fixed this. It turned out to be far simpler than we thought. Incidentally it was also broken on macOS. Let me know if I nailed it or not.