salt: Salt Tornado API: salt.exceptions.SaltReqTimeoutError: Message timed out
Description of Issue/Question
We just started using the salt API a few months ago and we cannot fix this issue. Currently, we are using Pepper to make the requests. I have created an “async” wrapper around it which runs a local_async command, polls with the jobs.exit_success runner for 30 seconds to a minute, then does a lookup_jid to get the results. Currently, the only thing we use this for is to retrieve all of the grains from the minions and store them in a database (just running grains.items non-batch mode). The output below does not contain the salt API logs from the successful local_async command and successful exit_success runners. It only contains the one that eventually fails.
When using the tornado API, about 2/3 of my “jobs” fail with the following message (trace logging on):
2019-05-16 06:08:28,354 [salt.utils.lazy :104 ][DEBUG ][148120] LazyLoaded jobs.exit_success
2019-05-16 06:08:28,354 [salt.transport.zeromq:136 ][DEBUG ][148120] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ch0pslop022_master', 'tcp://127.0.0.1:4506', 'clear')
2019-05-16 06:08:28,354 [salt.transport.zeromq:202 ][DEBUG ][148120] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
2019-05-16 06:08:28,354 [salt.transport.zeromq:1084][DEBUG ][148120] Trying to connect to: tcp://127.0.0.1:4506
2019-05-16 06:09:28,391 [salt.transport.zeromq:1139][DEBUG ][148120] SaltReqTimeoutError, retrying. (1/3)
2019-05-16 06:10:28,451 [salt.transport.zeromq:1139][DEBUG ][148120] SaltReqTimeoutError, retrying. (2/3)
2019-05-16 06:11:28,509 [salt.transport.zeromq:1139][DEBUG ][148120] SaltReqTimeoutError, retrying. (3/3)
2019-05-16 06:12:28,569 [salt.transport.zeromq:1109][DEBUG ][148120] Re-init ZMQ socket: Message timed out
2019-05-16 06:12:28,570 [salt.transport.zeromq:1084][DEBUG ][148120] Trying to connect to: tcp://127.0.0.1:4506
2019-05-16 06:12:28,570 [salt.loaded.int.netapi.rest_tornado.saltnado:926 ][ERROR ][148120] Unexpected exception while handling request:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/netapi/rest_tornado/saltnado.py", line 919, in disbatch
chunk_ret = yield getattr(self, '_disbatch_{0}'.format(low['client']))(low)
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python3/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 230, in wrapper
yielded = next(result)
File "/usr/lib/python3/dist-packages/salt/netapi/rest_tornado/saltnado.py", line 1114, in _disbatch_runner
pub_data = self.saltclients['runner'](chunk)
File "/usr/lib/python3/dist-packages/salt/runner.py", line 126, in cmd_async
return mixins.AsyncClientMixin.cmd_async(self, reformatted_low)
File "/usr/lib/python3/dist-packages/salt/client/mixins.py", line 505, in cmd_async
return self.master_call(**low)
File "/usr/lib/python3/dist-packages/salt/client/mixins.py", line 140, in master_call
ret = channel.send(load)
File "/usr/lib/python3/dist-packages/salt/utils/asynchronous.py", line 64, in wrap
ret = self._block_future(ret)
File "/usr/lib/python3/dist-packages/salt/utils/asynchronous.py", line 74, in _block_future
return future.result()
File "/usr/lib/python3/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 331, in send
ret = yield self._uncrypted_transfer(load, tries=tries, timeout=timeout)
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python3/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 320, in _uncrypted_transfer
tries=tries,
File "/usr/lib/python3/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python3/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out
Setup
(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).) We have a 3 master multi-master setup. The master which this is running on is a 16 core 128 GB RAM physical host (plenty of hardware).
Steps to Reproduce Issue
(Include debug logs if possible and relevant.) See description of issue. May be complicated to reproduce but I can hand over more code if needed.
Versions Report
(Provided by running salt --versions-report
. Please also mention any differences in master/minion versions.)
[zbarr@saltmaster03: ~]$ salt --versions-report
Salt Version:
Salt: 2018.3.4
Dependency Versions:
cffi: Not Installed
cherrypy: 3.5.0
dateutil: 2.4.2
docker-py: Not Installed
gitdb: 0.6.4
gitpython: 1.0.1
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.5.2 (default, Nov 23 2017, 16:37:01)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 17.1.2
RAET: Not Installed
smmap: 0.9.0
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.2.5
System Versions:
dist: Ubuntu 16.04 xenial
locale: UTF-8
machine: x86_64
release: 4.4.0-119-generic
system: Linux
version: Ubuntu 16.04 xenial
Please advise. Thanks!
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 23 (8 by maintainers)
Commits related to this issue
- salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow so... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
- salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow so... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
- salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow so... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
- salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow so... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
- salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow so... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
Still occurring on
3006.1
master with3006.1
minions.It’s a 8-core machine with 16 GB RAM. We have about 500 minions and the load average is typically around 1.2.
worker_threads
is set to 12 andsock_pool_size
set to 15.