salt: [BUG] Big payload cuts master from minion
Description Upon big payload (350Kb) a zombie process appears in salt-minion. Salt-master is not able to communicate with the salt-minion but salt-minion is able to communicate with salt-master.
Resolution:
- Either restart the salt-minion
- Or set
net.core.wmem_defaultto something bigger then the default (212992) - in our case26214400
Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
- on-prem machine
- VM (Virtualbox, KVM, etc. please specify)
- VM running on a cloud service, please be explicit and add details
- container (Kubernetes, Docker, containerd, etc. please specify)
- or a combination, please be explicit
- jails if it is FreeBSD
- classic packaging
- onedir packaging
- used bootstrap to install
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Create a state with big payload or reduce net.core.wmem_default
During applying the state:
19048 ? Ss 00:00:00 /usr/bin/python3 /usr/bin/salt-minion
19068 ? Sl 00:00:02 \_ /usr/bin/python3 /usr/bin/salt-minion
28158 ? Sl 00:00:00 \_ /usr/bin/python3 /usr/bin/salt-minion
After the state is applied:
PID TT STAT TIME CMD
19048 ? Ss 00:00:00 /usr/bin/python3 /usr/bin/salt-minion
19068 ? Sl 00:00:02 \_ /usr/bin/python3 /usr/bin/salt-minion
28158 ? Z 00:00:05 \_ [salt-minion] <defunct>
From the log:
Apr 13 08:37:36 salt-master salt-minion[28158]: --- Logging error ---
Apr 13 08:37:36 salt-master salt-minion[28158]: Traceback (most recent call last):
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
Apr 13 08:37:36 salt-master salt-minion[28158]: self.socket.send(msg)
Apr 13 08:37:36 salt-master salt-minion[28158]: OSError: [Errno 90] Message too long
Apr 13 08:37:36 salt-master salt-minion[28158]: During handling of the above exception, another exception occurred:
Apr 13 08:37:36 salt-master salt-minion[28158]: Traceback (most recent call last):
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/logging/handlers.py", line 938, in emit
Apr 13 08:37:36 salt-master salt-minion[28158]: self.socket.send(msg)
Apr 13 08:37:36 salt-master salt-minion[28158]: OSError: [Errno 90] Message too long
Apr 13 08:37:36 salt-master salt-minion[28158]: Call stack:
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/bin/salt-minion", line 11, in <module>
Apr 13 08:37:36 salt-master salt-minion[28158]: load_entry_point('salt==3005.1', 'console_scripts', 'salt-minion')()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/scripts.py", line 213, in salt_minion
Apr 13 08:37:36 salt-master salt-minion[28158]: process.start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
Apr 13 08:37:36 salt-master salt-minion[28158]: self._popen = self._Popen(self)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]: return _default_context.get_context().Process._Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]: return Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
Apr 13 08:37:36 salt-master salt-minion[28158]: self._launch(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
Apr 13 08:37:36 salt-master salt-minion[28158]: code = process_obj._bootstrap(parent_sentinel=child_r)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Apr 13 08:37:36 salt-master salt-minion[28158]: self.run()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
Apr 13 08:37:36 salt-master salt-minion[28158]: self._target(*self._args, **self._kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/scripts.py", line 142, in minion_process
Apr 13 08:37:36 salt-master salt-minion[28158]: minion.start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 325, in start
Apr 13 08:37:36 salt-master salt-minion[28158]: self._real_start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 337, in _real_start
Apr 13 08:37:36 salt-master salt-minion[28158]: self.minion.tune_in()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/minion.py", line 1182, in tune_in
Apr 13 08:37:36 salt-master salt-minion[28158]: self.io_loop.start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 834, in start
Apr 13 08:37:36 salt-master salt-minion[28158]: self._run_callback(self._callbacks.popleft())
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
Apr 13 08:37:36 salt-master salt-minion[28158]: ret = callback()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]: return fn(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 294, in wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]: result = func(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/types.py", line 278, in wrapped
Apr 13 08:37:36 salt-master salt-minion[28158]: coro = func(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/minion.py", line 1059, in handle_event
Apr 13 08:37:36 salt-master salt-minion[28158]: minion.handle_event(package)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 309, in wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]: yielded = next(result)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/minion.py", line 2712, in handle_event
Apr 13 08:37:36 salt-master salt-minion[28158]: _minion.manage_schedule(tag, data)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/minion.py", line 2561, in manage_schedule
Apr 13 08:37:36 salt-master salt-minion[28158]: getattr(self.schedule, alias)(*params)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 511, in run_job
Apr 13 08:37:36 salt-master salt-minion[28158]: self._run_job(func, data)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 1822, in _run_job
Apr 13 08:37:36 salt-master salt-minion[28158]: proc.start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/process.py", line 1097, in start
Apr 13 08:37:36 salt-master salt-minion[28158]: super().start()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
Apr 13 08:37:36 salt-master salt-minion[28158]: self._popen = self._Popen(self)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]: return _default_context.get_context().Process._Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]: return Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
Apr 13 08:37:36 salt-master salt-minion[28158]: self._launch(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
Apr 13 08:37:36 salt-master salt-minion[28158]: code = process_obj._bootstrap(parent_sentinel=child_r)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Apr 13 08:37:36 salt-master salt-minion[28158]: self.run()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/process.py", line 993, in wrapped_run_func
Apr 13 08:37:36 salt-master salt-minion[28158]: return run_func()
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
Apr 13 08:37:36 salt-master salt-minion[28158]: self._target(*self._args, **self._kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 912, in handle_func
Apr 13 08:37:36 salt-master salt-minion[28158]: event.fire_event(load, "__schedule_return")
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/utils/event.py", line 821, in fire_event
Apr 13 08:37:36 salt-master salt-minion[28158]: log.debug("Sending event: tag = %s; data = %s", tag, data)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3.8/logging/__init__.py", line 1434, in debug
Apr 13 08:37:36 salt-master salt-minion[28158]: self._log(DEBUG, msg, args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]: File "/usr/lib/python3/dist-packages/salt/_logging/impl.py", line 284, in _log
Apr 13 08:37:36 salt-master salt-minion[28158]: LOGGING_LOGGER_CLASS._log(
Apr 13 08:37:36 salt-master salt-minion[28158]: Message: 'Sending event: tag = %s; data = %s'
From other server with onedir the linux process table provides the more context for the same command
PID TT STAT TIME CMD
3909307 ? Sl 00:00:01 \_ /opt/saltstack/salt/run/run minion MultiMinionProcessManager MinionProcessManager
3914621 ? Sl 00:00:00 \_ /opt/saltstack/salt/run/run minion MultiMinionProcessManager MinionProcessManager Schedule(name=formula.salt.minion-schedule-highstate, jid=None)
Expected behavior Documentation how to setup wmem_default related to salt. Meaningful log message. Prevent zombie process and cutting salt-master from the minion.
Screenshots
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)# salt --versions
Salt Version:
Salt: 3005.1
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.7.3
docker-py: 4.1.0
gitdb: 2.0.6
gitpython: 3.0.7
Jinja2: 2.10.1
libgit2: 0.28.3
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 0.6.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.6.1
pygit2: 1.0.3
Python: 3.8.10 (default, Mar 13 2023, 10:26:41)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 20.0.0
smmap: 2.0.5
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-139-generic
system: Linux
version: Ubuntu 20.04 focal
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (7 by maintainers)
IMO we should still add it to sysutils/py-salt for now, maybe with a brief description of how to identify the bug in case some system still hits it even with the much higher limit.