python: Extremely slow deserialization due to lock contention and logging module

We are experiencing performance issues when doing API calls in many threads at the same time. The issue only occurs on certain clusters and we suspect it is happens when a high number of API calls occur simultaneously.

When looking at stack traces, we see contention around a lock in the logging module.

Thread XYZ (most recent call first):
  File \"/usr/local/lib/python3.9/logging/__init__.py\", line 225 in _acquireLock
  File \"/usr/local/lib/python3.9/logging/__init__.py\", line 1298 in getLogger
  File \"/usr/local/lib/python3.9/logging/__init__.py\", line 2042 in getLogger
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/configuration.py\", line 112 in __init__
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/models/v1_container_image.py\", line 48 in __init__
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 641 in __deserialize_model
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 303 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 280 in <listcomp>
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 280 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 639 in __deserialize_model
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 303 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 639 in __deserialize_model
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 303 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 280 in <listcomp>
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 280 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 639 in __deserialize_model
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 303 in __deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 264 in deserialize
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 192 in __call_api
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 348 in call_api
  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py\", line 16517 in list_node_with_http_info
  File \"/usr/local/lib/python3.9/site-packages/hikaru/model/rel_1_16/v1/v1.py\", line 46011 in listNode
  ... application specific stuff

The above is from a single thread. Each stack trace contains about 20 threads like that, all stuck on different API calls in the client library, all waiting in _acquireLock. (There are several paths to _acquireLock - it’s not just getLogger but also functions like _clear_cache(), setLeve() etc.)

Is there a fundamental reason why deserializing a Kubernetes object must involve a call to getLogger() and other logging functions?

If not, it would be great to improve the performance here and allow not calling any lockful logging functions during deserialization.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 9
  • Comments: 15 (1 by maintainers)

Most upvoted comments

the issue was reported back in 2019 https://github.com/kubernetes-client/python/issues/836 and became rotten without any reply I also looked into this because of the memory issue. it’s not AKS specific. and it’s not bound to the python version used.

Here’s a solution, just hack out the lines in the rest.py file that is the root of all evil in a dockerfile during build:

find /usr/local/lib/python*/dist-packages/kubernetes/client/rest.py -type f -exec sed -i 's:^\(.*logger.*\)$:#\1:' {} \;