python: Python freeze/hang on exit
Context
Hello,
In a batch manager project, we are using this python client to submit jobs to the Kubernetes API. In one word, the python application loads the library, submit the job, watch the related events, clean/delete the job, then return the succeeded or failed status. But sometimes, the application hang at the exit.
After investigation, it seems the ThreadPool used in the ApiClient class is not properly clean on the python process exit.
Reproduce
The easiest way to reproduce is to run this snippet:
python_version=3.6
kubernetes_version=4.0.0
docker run --name testing --rm -it --entrypoint "" python:$python_version /bin/bash -c "
pip install 'kubernetes==$kubernetes_version'
while true; do echo ===;
for i in {0..50}; do python -c '
from kubernetes import client
coreapi = client.CoreV1Api()
print(0)' &
done
wait
done"
This will run Python in a Docker container, install the Kubernetes python module, then run the test indefinitely. The test starts a simple application 50 times in order to increase the probability. This application loads the Kubernetes python module, create a CoreV1Api, which creates its ApiClient (with Async enabled using ThreadPool), then print 0 showing the freeze occured during the python exit sequence.
To stop the test:
docker rm -f testing
Expected:
This code should run indefinitely.
Result:
The loop hang on list of 0 after some time.
Workaround:
To avoid this issue, we override the ApiClient class to disable Async / ThreadPool feature. It seems to work without any issues so far. Downside is we are loosing the Async mode.
Thank you.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 9
- Comments: 20 (4 by maintainers)
Commits related to this issue
- blockaws cron hangs due to bug in K8s Python client https://github.com/kubernetes-client/python/issues/411 — committed to mozmeao/infra by bookshelfdave 6 years ago
- Add option to use kubectl to work around kubernetes-client/python#411 — committed to mozmeao/infra by jgmize 6 years ago
- Add option to use kubectl to work around kubernetes-client/python#411 — committed to mozmeao/infra by jgmize 6 years ago
- Add option to use kubectl to work around kubernetes-client/python#411 (#813) * Add option to use kubectl to work around kubernetes-client/python#411 * Use image built from previous commit in cron... — committed to mozmeao/infra by jgmize 6 years ago
- Workaround kubernetes python client deadlock issue The kubernetes python client has a bug [1] which results in frequent deadlocks while being cleaned up, which causes armada to hang at the end of exe... — committed to airshipit/armada by seaneagan 6 years ago
This seems to be related to the
__del__method onApiClientcleaning up itsThreadPool: https://github.com/kubernetes-incubator/client-python/blob/30b8ee44f4d14546e651dead91306719d53f8c37/kubernetes/client/api_client.py#L76-L78This can cause a deadlock when the api clients are garbage collected as Python exits. I can reproduce with the following:
and running:
On macOS 10.12.6 with Python 3.6.3, after 1-50 executions, it will print “exiting…” and stall. The python process won’t ever terminate until you hit ctrl-c. I can also reproduce on Linux but it seems to be less frequent that on macOS.
This means a simple script like this:
may never terminate because
CoreV1ApiandBatchV1Apiwill both instantiateApiClients which have the problematic__del__method. We can reduce the likelihood of a deadlock by creating a singleApiClientand passing it intoCoreV1ApiandBatchV1Apibut the problem doesn’t go away entirely. There are also some classes likeWatchthat always instantiate their ownApiClient.I wonder whether the
multiprocessing.pool.ThreadPoolclass is suitable for production use cases. According to a stackoverflow comment I came across:@furkanmustafa: You can’t reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@spacez320 The workaround was added and it’s available in the latest version 9.0.0a1.
Hi @Sturgelose,
The client.CoreV1Api() object use dependencies injection to get its ApiClient. We can use this to create our own ApiClient, then inject it.
To create our own K8sApiClient, we are using inheritance of the ApiClient, then we override some functions.
We are talking about this Class: https://github.com/kubernetes-client/python/blob/v4.0.0/kubernetes/client/api_client.py#L32
Here comes the patched K8sApiClient:
Then how to use it:
It’s just some copy/paste from a project, it’s not tested code. But you should have everything to disable this ThreadPool issue.
@johnmarcou @RobbieClarken I’m having exactly the same situation as @sbconsulting and I’m trying to find out a workaround to this issue. Would you mind sharing some code that you are using to temporally patch this issue? I can see that John reported that they override the APIClient class and disabled the threadpool feature, but would be good to know how you did it in case other people have the same issue as all us.
Anyways, thanks a lot for the discussion in this issue, I’ve been a week hunting this ghost bug!