salt: [BUG] Memory Leak in EventPublisher process

Description

We’ve observed a memory leak in the EventPubisher process on the master. This can be seen by adding a simple engine to a salt-minion that sends lots of auth requests. It looks like the leak isn’t specific to auth events. A large number of any events should trigger it.

Setup

import salt.crypt
import salt.ext.tornado.ioloop
import salt.ext.tornado.gen
import logging

log = logging.getLogger(__name__)


@salt.ext.tornado.gen.coroutine
def do_auth(opts, io_loop):
    while True:
        auth = salt.crypt.AsyncAuth(opts, io_loop=io_loop)
        log.info("ENGINE DO AUTH")
        yield auth.sign_in()


def start():
    io_loop = salt.ext.tornado.ioloop.IOLoop()
    __opts__['master_uri'] = 'tcp://127.0.0.1:4506'
    io_loop.spawn_callback(do_auth, __opts__, io_loop)
    io_loop.start()

Versions

v3004

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 23 (12 by maintainers)

Most upvoted comments

We are running salt 3006.6 and are seeing this memory leak. I examined salt/minion.py and it already has the fix suggested in https://github.com/saltstack/salt/issues/61565#issuecomment-1867502647

Further, after the upgrade from 3006.5 to 3006.6 we are now seeing this problem present much faster than before. It used to take about 20 days, now we’ve noticed it after just 7 days.

@max-arnold

In the past we’ve relied on the __del__ to clean things like this up. That is generally considered an anti-pattern in python and we’ve been working towards cleaning that practice up. Most recently I’ve started adding warnings if an object is getting garbage collected without being properly closed (https://github.com/saltstack/salt/pull/65559). Several places (now fixed) where transport client’s were not being closed have been revealed in our test suite. With this code in place it should be easier for users to identify these kinds of issues an report them with useful debug info.

We’ve also been working towards better debugging of running salt processes with better tooling. We’ve added debug symbol packages in 3006.x and there is a newer tool relenv-gdb-dbg to help debug these kinds of issues.