python: Python freeze/hang on exit

Context

Hello,

In a batch manager project, we are using this python client to submit jobs to the Kubernetes API. In one word, the python application loads the library, submit the job, watch the related events, clean/delete the job, then return the succeeded or failed status. But sometimes, the application hang at the exit.

After investigation, it seems the ThreadPool used in the ApiClient class is not properly clean on the python process exit.

Reproduce

The easiest way to reproduce is to run this snippet:

python_version=3.6
kubernetes_version=4.0.0

docker run --name testing --rm -it --entrypoint "" python:$python_version /bin/bash -c "
pip install 'kubernetes==$kubernetes_version'
while true; do echo ===;
  for i in {0..50}; do python -c '

from kubernetes import client
coreapi = client.CoreV1Api()
print(0)' &

  done
wait
done"

This will run Python in a Docker container, install the Kubernetes python module, then run the test indefinitely. The test starts a simple application 50 times in order to increase the probability. This application loads the Kubernetes python module, create a CoreV1Api, which creates its ApiClient (with Async enabled using ThreadPool), then print 0 showing the freeze occured during the python exit sequence.

To stop the test:

docker rm -f testing

Expected:

This code should run indefinitely.

Result:

The loop hang on list of 0 after some time.

Workaround:

To avoid this issue, we override the ApiClient class to disable Async / ThreadPool feature. It seems to work without any issues so far. Downside is we are loosing the Async mode.

Thank you.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 9
  • Comments: 20 (4 by maintainers)

Commits related to this issue

Most upvoted comments

This seems to be related to the __del__ method on ApiClient cleaning up its ThreadPool: https://github.com/kubernetes-incubator/client-python/blob/30b8ee44f4d14546e651dead91306719d53f8c37/kubernetes/client/api_client.py#L76-L78

This can cause a deadlock when the api clients are garbage collected as Python exits. I can reproduce with the following:

# deadlock.py
from multiprocessing.pool import ThreadPool

class Deadlocker:
    def __init__(self):
        self.pool = ThreadPool()

    def __del__(self):
        self.pool.close()
        self.pool.join()

d1 = Deadlocker()
d2 = Deadlocker()
print('exiting...')

and running:

while true; do python3 deadlock.py; done

On macOS 10.12.6 with Python 3.6.3, after 1-50 executions, it will print “exiting…” and stall. The python process won’t ever terminate until you hit ctrl-c. I can also reproduce on Linux but it seems to be less frequent that on macOS.

This means a simple script like this:

from kubernetes import client
coreapi = client.CoreV1Api()
batchapi = client.BatchV1Api()

may never terminate because CoreV1Api and BatchV1Api will both instantiate ApiClients which have the problematic __del__ method. We can reduce the likelihood of a deadlock by creating a single ApiClient and passing it into CoreV1Api and BatchV1Api but the problem doesn’t go away entirely. There are also some classes like Watch that always instantiate their own ApiClient.

I wonder whether the multiprocessing.pool.ThreadPool class is suitable for production use cases. According to a stackoverflow comment I came across:

The multiprocessing.pool.ThreadPool is not documented as its implementation has never been completed. It lacks tests and documentation.

@furkanmustafa: You can’t reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spacez320 The workaround was added and it’s available in the latest version 9.0.0a1.

Hi @Sturgelose,

The client.CoreV1Api() object use dependencies injection to get its ApiClient. We can use this to create our own ApiClient, then inject it.

To create our own K8sApiClient, we are using inheritance of the ApiClient, then we override some functions.

We are talking about this Class: https://github.com/kubernetes-client/python/blob/v4.0.0/kubernetes/client/api_client.py#L32

Here comes the patched K8sApiClient:

class K8sApiClient(client.ApiClient):
    def call_api(self, *args, async=None, **kwargs):
        return super().call_api(*args, async=False, **kwargs)

    def __init__(self, configuration=None, header_name=None, header_value=None, cookie=None):
        if configuration is None:
            configuration = Configuration()
        self.configuration = configuration

#        self.pool = ThreadPool()
        self.rest_client = RESTClientObject(configuration)
        self.default_headers = {}
        if header_name is not None:
            self.default_headers[header_name] = header_value
        self.cookie = cookie
        # Set default User-Agent.
        self.user_agent = 'Swagger-Codegen/4.0.0/python'

    def __del__(self):
        pass

Then how to use it:

        config = client.Configuration()
        api_client = K8sApiClient(configuration=config)
        coreapi = client.CoreV1Api(api_client)
        batchapi = client.BatchV1Api(api_client)

It’s just some copy/paste from a project, it’s not tested code. But you should have everything to disable this ThreadPool issue.

@johnmarcou @RobbieClarken I’m having exactly the same situation as @sbconsulting and I’m trying to find out a workaround to this issue. Would you mind sharing some code that you are using to temporally patch this issue? I can see that John reported that they override the APIClient class and disabled the threadpool feature, but would be good to know how you did it in case other people have the same issue as all us.

Anyways, thanks a lot for the discussion in this issue, I’ve been a week hunting this ghost bug!