salt: [BUG] Big payload cuts master from minion

Description Upon big payload (350Kb) a zombie process appears in salt-minion. Salt-master is not able to communicate with the salt-minion but salt-minion is able to communicate with salt-master.

Resolution:

  1. Either restart the salt-minion
  2. Or set net.core.wmem_default to something bigger then the default (212992) - in our case 26214400

Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior (Include debug logs if possible and relevant)

Create a state with big payload or reduce net.core.wmem_default

During applying the state:

  19048 ?        Ss   00:00:00 /usr/bin/python3 /usr/bin/salt-minion
  19068 ?        Sl   00:00:02  \_ /usr/bin/python3 /usr/bin/salt-minion
  28158 ?        Sl   00:00:00      \_ /usr/bin/python3 /usr/bin/salt-minion

After the state is applied:

    PID TT       STAT     TIME CMD
  19048 ?        Ss   00:00:00 /usr/bin/python3 /usr/bin/salt-minion
  19068 ?        Sl   00:00:02  \_ /usr/bin/python3 /usr/bin/salt-minion
  28158 ?        Z    00:00:05      \_ [salt-minion] <defunct>

From the log:

Apr 13 08:37:36 salt-master salt-minion[28158]: --- Logging error ---
Apr 13 08:37:36 salt-master salt-minion[28158]: Traceback (most recent call last):
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.socket.send(msg)
Apr 13 08:37:36 salt-master salt-minion[28158]: OSError: [Errno 90] Message too long
Apr 13 08:37:36 salt-master salt-minion[28158]: During handling of the above exception, another exception occurred:
Apr 13 08:37:36 salt-master salt-minion[28158]: Traceback (most recent call last):
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/logging/handlers.py", line 938, in emit
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.socket.send(msg)
Apr 13 08:37:36 salt-master salt-minion[28158]: OSError: [Errno 90] Message too long
Apr 13 08:37:36 salt-master salt-minion[28158]: Call stack:
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/bin/salt-minion", line 11, in <module>
Apr 13 08:37:36 salt-master salt-minion[28158]:     load_entry_point('salt==3005.1', 'console_scripts', 'salt-minion')()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/scripts.py", line 213, in salt_minion
Apr 13 08:37:36 salt-master salt-minion[28158]:     process.start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._popen = self._Popen(self)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]:     return _default_context.get_context().Process._Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]:     return Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._launch(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
Apr 13 08:37:36 salt-master salt-minion[28158]:     code = process_obj._bootstrap(parent_sentinel=child_r)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.run()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._target(*self._args, **self._kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/scripts.py", line 142, in minion_process
Apr 13 08:37:36 salt-master salt-minion[28158]:     minion.start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 325, in start
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._real_start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 337, in _real_start
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.minion.tune_in()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/minion.py", line 1182, in tune_in
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.io_loop.start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 834, in start
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._run_callback(self._callbacks.popleft())
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
Apr 13 08:37:36 salt-master salt-minion[28158]:     ret = callback()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]:     return fn(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 294, in wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]:     result = func(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/types.py", line 278, in wrapped
Apr 13 08:37:36 salt-master salt-minion[28158]:     coro = func(*args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/minion.py", line 1059, in handle_event
Apr 13 08:37:36 salt-master salt-minion[28158]:     minion.handle_event(package)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 309, in wrapper
Apr 13 08:37:36 salt-master salt-minion[28158]:     yielded = next(result)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/minion.py", line 2712, in handle_event
Apr 13 08:37:36 salt-master salt-minion[28158]:     _minion.manage_schedule(tag, data)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/minion.py", line 2561, in manage_schedule
Apr 13 08:37:36 salt-master salt-minion[28158]:     getattr(self.schedule, alias)(*params)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 511, in run_job
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._run_job(func, data)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 1822, in _run_job
Apr 13 08:37:36 salt-master salt-minion[28158]:     proc.start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/process.py", line 1097, in start
Apr 13 08:37:36 salt-master salt-minion[28158]:     super().start()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._popen = self._Popen(self)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]:     return _default_context.get_context().Process._Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
Apr 13 08:37:36 salt-master salt-minion[28158]:     return Popen(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._launch(process_obj)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
Apr 13 08:37:36 salt-master salt-minion[28158]:     code = process_obj._bootstrap(parent_sentinel=child_r)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Apr 13 08:37:36 salt-master salt-minion[28158]:     self.run()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/process.py", line 993, in wrapped_run_func
Apr 13 08:37:36 salt-master salt-minion[28158]:     return run_func()
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._target(*self._args, **self._kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/schedule.py", line 912, in handle_func
Apr 13 08:37:36 salt-master salt-minion[28158]:     event.fire_event(load, "__schedule_return")
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/utils/event.py", line 821, in fire_event
Apr 13 08:37:36 salt-master salt-minion[28158]:     log.debug("Sending event: tag = %s; data = %s", tag, data)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3.8/logging/__init__.py", line 1434, in debug
Apr 13 08:37:36 salt-master salt-minion[28158]:     self._log(DEBUG, msg, args, **kwargs)
Apr 13 08:37:36 salt-master salt-minion[28158]:   File "/usr/lib/python3/dist-packages/salt/_logging/impl.py", line 284, in _log
Apr 13 08:37:36 salt-master salt-minion[28158]:     LOGGING_LOGGER_CLASS._log(
Apr 13 08:37:36 salt-master salt-minion[28158]: Message: 'Sending event: tag = %s; data = %s'

From other server with onedir the linux process table provides the more context for the same command

     PID TT       STAT     TIME CMD
3909307 ?        Sl   00:00:01  \_ /opt/saltstack/salt/run/run minion MultiMinionProcessManager MinionProcessManager
3914621 ?        Sl   00:00:00      \_ /opt/saltstack/salt/run/run minion MultiMinionProcessManager MinionProcessManager Schedule(name=formula.salt.minion-schedule-highstate, jid=None)

Expected behavior Documentation how to setup wmem_default related to salt. Meaningful log message. Prevent zombie process and cutting salt-master from the minion.

Screenshots

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
# salt --versions
Salt Version:
          Salt: 3005.1
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: 4.1.0
         gitdb: 2.0.6
     gitpython: 3.0.7
        Jinja2: 2.10.1
       libgit2: 0.28.3
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.6.1
        pygit2: 1.0.3
        Python: 3.8.10 (default, Mar 13 2023, 10:26:41)
  python-gnupg: 0.4.5
        PyYAML: 5.3.1
         PyZMQ: 20.0.0
         smmap: 2.0.5
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.2
 
System Versions:
          dist: ubuntu 20.04 focal
        locale: utf-8
       machine: x86_64
       release: 5.4.0-139-generic
        system: Linux
       version: Ubuntu 20.04 focal

Additional context Add any other context about the problem here.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

I agree that it’s not a perfect solution.

IMO we should still add it to sysutils/py-salt for now, maybe with a brief description of how to identify the bug in case some system still hits it even with the much higher limit.