salt: [BUG] Memory leak in master ProcessManager

Description We updated our infrastructure from 3002.6 to 3003 and observed a rather severe memory leak in the ProcessManager of the salt master. We have both OSS and SSE masters and could observe the same behavior on all of them.

Steps to Reproduce the behavior Issue is repeatable here and immediately visible after restarting salt-master (trend is visible after 5-15 minutes)

Screenshots Screenshot_20210419_090141

Versions Report

salt --versions-report
Salt Version:
          Salt: 3003

Dependency Versions:
          cffi: 1.14.1
      cherrypy: unknown
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.1
       libgit2: 1.0.1
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: 1.3.12
     pycparser: 2.14
      pycrypto: Not Installed
  pycryptodome: 3.10.1
        pygit2: 1.2.1
        Python: 3.6.8 (default, Nov 16 2020, 16:55:22)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 17.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4

Salt Extensions:
        sseape: 8.3.0+4

System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1127.19.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (16 by maintainers)

Commits related to this issue

Most upvoted comments

We see this issue on our system too. Salt-master becomes non-responsive and requires a restart. Depending on the load, the memory leak accumulates faster or slower. Sometimes it is workable for a couple of weeks, sometimes it exhausts the memory in a day. A single process out of all spanned by the salt-master is responsible for this. Not sure what it does.

top -p 24530 
    PID USER   PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24530 root   20   0 10.539g 9.972g   3096 R  66.7 85.0  21:40.25 salt-master  <- memory use at the time of non-responsiveness

Soon after restart:

ps -eo pid,ppid,%mem,%cpu,cmd --sort=-%mem | grep salt-master
  1394   1107  3.0  0.3 /usr/bin/python3 /usr/bin/salt-master  <- note this process. It grew-up from 0 to 3% in a few hours
  1396   1107  0.7  0.8 /usr/bin/python3 /usr/bin/salt-master
  1429   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1423   1398  0.6  0.6 /usr/bin/python3 /usr/bin/salt-master
  1427   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1436   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1434   1398  0.6  0.3 /usr/bin/python3 /usr/bin/salt-master
  1435   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1418   1398  0.6  0.5 /usr/bin/python3 /usr/bin/salt-master
  1420   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1428   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1431   1398  0.6  0.5 /usr/bin/python3 /usr/bin/salt-master
  1430   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1402   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1400   1398  0.6  0.4 /usr/bin/python3 /usr/bin/salt-master
  1419   1398  0.5  0.4 /usr/bin/python3 /usr/bin/salt-master
  1401   1398  0.5  0.5 /usr/bin/python3 /usr/bin/salt-master
  1395   1107  0.5  0.3 /usr/bin/python3 /usr/bin/salt-master
  1399   1398  0.5  0.1 /usr/bin/python3 /usr/bin/salt-master
  1391   1107  0.4  0.0 /usr/bin/python3 /usr/bin/salt-master
  1107      1  0.4  0.0 /usr/bin/python3 /usr/bin/salt-master
  1417   1107  0.3  0.2 /usr/bin/python3 /usr/bin/salt-master
  1398   1107  0.3  0.0 /usr/bin/python3 /usr/bin/salt-master
  1180   1107  0.2  0.0 /usr/bin/python3 /usr/bin/salt-master

Versions:

Salt Version:
          Salt: 3002.2

Dependency Versions:
          cffi: 1.14.4
      cherrypy: 3.5.0
      dateutil: 2.4.2
     docker-py: Not Installed
         gitdb: 0.6.4
     gitpython: 1.0.1
        Jinja2: 2.11.2
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.17
      pycrypto: Not Installed
  pycryptodome: 3.4.7
        pygit2: Not Installed
        Python: 3.5.2 (default, Jan 26 2021, 13:30:48)
  python-gnupg: 0.3.8
        PyYAML: 3.12
         PyZMQ: 17.0.0
         smmap: 0.9.0
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.6

System Versions:
          dist: ubuntu 16.04 Xenial Xerus
        locale: UTF-8
       machine: x86_64
       release: 4.4.0-141-generic
        system: Linux
       version: Ubuntu 16.04 Xenial Xerus

If you install setproctitle python module it will list what most of the processes do. there are two or three that don’t have labels though. for troubleshooting memory leaks it is important to know which process is causing the problems.

Also since you are running 3002.2 the memory leak you are reporting will be vastly different then the one in this ticket. 3002.2 had it’s own memory issues. this ticket should be for 3003+