salt: salt.client.LocalClient is not thread-safe
Even when salt.client.LocalClient
instances are not shared, running them simultaneously in multiple threads results in whole Python process crashing in zmq/core/context.so
. This issue is also reproducible with LocalClient
being reused/shared across multiple threads.
The following test script demonstrates the issue:
import salt.client
import time
import threading
def spawn_local_client_and_run():
def local_client_run():
client = salt.client.LocalClient(c_path='/etc/salt/master')
client.cmd('ondev-salt', 'anything')
counter = 0
while True:
t = threading.Thread(target=local_client_run, name=('thread%d' % counter))
t.start()
counter += 1
#100 simultaneous threads is ought to be enough for anyone!
if counter % 100 == 0:
print '.'
time.sleep(1)
if __name__ == '__main__':
spawn_local_client_and_run()
On my Ubuntu 12.10 it crashes almost immediately with the following stack trace:
*** glibc detected *** python: double free or corruption (out): 0x00007f315400b760 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f3171b37b96]
/usr/lib/python2.7/dist-packages/zmq/core/context.so(+0x3c61)[0x7f316edf5c61]
python[0x4c99d4]
/usr/lib/python2.7/dist-packages/zmq/core/context.so(+0x3d4f)[0x7f316edf5d4f]
/usr/lib/python2.7/dist-packages/zmq/core/socket.so(+0x61f3)[0x7f316ebe11f3]
python[0x4b59f9]
python[0x48a7e0]
python[0x4b5483]
python(PyEval_EvalCodeEx+0x1b0)[0x467220]
...
It would first seem like the issue is with bad usage of ZMQ in salt.client
(as suggested by https://github.com/zeromq/pyzmq/issues/131), but to me that code looks OK (i.e. it doesn’t use a global socket, which might be shared across threads). May be several LocalClient instances are sharing some resources, which they should not, or maybe it is a ZMQ issue after all?
About this issue
- Original URL
- State: closed
- Created 12 years ago
- Comments: 16 (10 by maintainers)
Commits related to this issue
- Implemented a workaround to saltstack/salt#2536 and also implemented the async version of the executor (ON-584, ON-702) — committed to opennode/opennode-knot by deemoowoor 12 years ago
I confirm this still happens to me too on 2017.7.5. In my case it happens when I call multiple reactors that consequently call a custom module that uses LocalClient. Usually when I get up to 3 or 4 simultaneous requests, it works just fine… Above that, it starts crashing with:
Exception: DummyFuture does not support blocking for results
Can we please reopen this issue?