salt: KeyError: u'ret' when running salt -b x ... on salt master-of-masters.

Description of Issue/Question

Running salt -b 50 '*' state.highstate on salt MasterOfMasters and getting an exception:

[ERROR   ] An un-handled exception was caught by salt's global exception handler:
KeyError: u'ret'
Traceback (most recent call last):
  File "/bin/salt", line 10, in <module>
    salt_main()
  File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 485, in salt_main
    client.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/salt.py", line 65, in run
    self._run_batch()
  File "/usr/lib/python2.7/site-packages/salt/cli/salt.py", line 283, in _run_batch
    for res in batch.run():
  File "/usr/lib/python2.7/site-packages/salt/cli/batch.py", line 262, in run
    ret[minion] = data['ret']
KeyError: u'ret'
Traceback (most recent call last):
  File "/bin/salt", line 10, in <module>
    salt_main()
  File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 485, in salt_main
    client.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/salt.py", line 65, in run
    self._run_batch()
  File "/usr/lib/python2.7/site-packages/salt/cli/salt.py", line 283, in _run_batch
    for res in batch.run():
  File "/usr/lib/python2.7/site-packages/salt/cli/batch.py", line 262, in run
    ret[minion] = data['ret']
KeyError: u'ret'

I added a logging statement after for minion, data in six.iteritems(parts): in /usr/lib/python2.7/site-packages/salt/cli/batch.py to print the minion & its data value. When this exception happens, data is always {'failed': True}.

The minion that it fails on is random too.

Setup

MasterOfMaster(1) -> Syndic(3) -> Minion (800)

Minion’s are configured with 3 masters and master_type: “failover” so they only are connected to a single syndic at a time.

Steps to Reproduce Issue

Exception occurs regardless of running either of these:

salt -b 50 ‘*’ state.highstate

salt -b 50 ‘*’ state.highstate --static

Versions Report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

master# salt --versions-report Salt Version: Salt: 2018.3.2

Dependency Versions: cffi: 1.6.0 cherrypy: Not Installed dateutil: 1.5 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.8 libgit2: 0.26.3 libnacl: 1.6.1 M2Crypto: 0.28.2 Mako: 0.8.1 msgpack-pure: Not Installed msgpack-python: 0.5.1 mysql-python: Not Installed pycparser: 2.14 pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: 0.26.4 Python: 2.7.5 (default, May 31 2018, 09:41:32) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: redhat 7.5 Maipo locale: UTF-8 machine: x86_64 release: 3.10.0-862.6.3.el7.x86_64 system: Linux version: Red Hat Enterprise Linux Server 7.5 Maipo

syndic# salt --versions-report Salt Version: Salt: 2018.3.2

Dependency Versions: cffi: 1.6.0 cherrypy: Not Installed dateutil: 1.5 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.8 libgit2: 0.26.3 libnacl: 1.6.1 M2Crypto: 0.28.2 Mako: 0.8.1 msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.14 pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: 0.26.4 Python: 2.7.5 (default, May 31 2018, 09:41:32) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: redhat 7.5 Maipo locale: UTF-8 machine: x86_64 release: 3.10.0-862.14.4.el7.x86_64 system: Linux version: Red Hat Enterprise Linux Server 7.5 Maipo

minion# # salt-call --versions-report Salt Version: Salt: 2018.3.2

Dependency Versions: cffi: 1.6.0 cherrypy: Not Installed dateutil: 1.5 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.7.2 libgit2: Not Installed libnacl: Not Installed M2Crypto: 0.28.2 Mako: 0.8.1 msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.14 pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 2.7.5 (default, May 31 2018, 09:41:32) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: redhat 7.5 Maipo locale: UTF-8 machine: x86_64 release: 3.10.0-862.14.4.el7.x86_64 system: Linux version: Red Hat Enterprise Linux Server 7.5 Maipo

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 24 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Another bump as well. This has been coming up more and more frequently on our deployment.

This little patch prevents the KeyError u’ret’ from being returned and causing the batch to completely fail. I have not tested it with raw output.

$ git diff
diff --git a/salt/cli/batch.py b/salt/cli/batch.py
index e3a7bf9..65ba130 100644
--- a/salt/cli/batch.py
+++ b/salt/cli/batch.py
@@ -255,6 +255,10 @@ class Batch(object):
                     if self.opts.get('failhard') and data['ret']['retcode'] > 0:
                         failhard = True

+                if data.get('failed') is True:
+                    log.debug('Minion %s failed to respond: data=%s', minion, data)
+                    data = {'ret': 'Minion did not return. [Failed]'}
+
                 if self.opts.get('raw'):
                     ret[minion] = data
                     yield data

Connected minions still randomly fail to return results, and that’s probably a separate issue to look into and address.