kubernetes: terminationGracePeriodSeconds greater than 10 minutes not working as expected
What happened: Pod with termination grace period of 3 hours is getting killed 10 minutes after SIGTERM
What you expected to happen: I was expecting the pod to get full 3 hours before SIGKILL is sent
How to reproduce it (as minimally and precisely as possible): Long running process with termination grace period greater than 20 minutes and delete pod it will get deleted in 10 minutes
Anything else we need to know?: From kubectl get events: 33m Normal ScaleDown pod/jarvis-6f9c9c79d6-d7vhr deleting pod for node scale down 33m Normal Killing pod/jarvis-6f9c9c79d6-d7vhr Stopping container jarvis 23m Warning FailedKillPod pod/jarvis-6f9c9c79d6-d7vhr error killing pod: failed to “KillContainer” for “jarvis” with KillContainerError: “rpc error: code = Unknown desc = operation timeout: context deadline exceeded”
Environment:
- Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.10-gke.42”, GitCommit:“42bef28c2031a74fc68840fce56834ff7ea08518”, GitTreeState:“clean”, BuildDate:“2020-06-02T16:07:00Z”, GoVersion:“go1.12.12b4”, Compiler:“gc”, Platform:“linux/amd64”} - Cloud provider or hardware configuration: GKE
- OS (e.g:
cat /etc/os-release): NAME=“Ubuntu” VERSION=“18.04.5 LTS (Bionic Beaver)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 18.04.5 LTS” VERSION_ID=“18.04” HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic - Kernel (e.g.
uname -a): Linux chakradarraju-lenovo 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (9 by maintainers)
We did some digging. It looks like the nodes were getting killed by cluster autoscaler which only allows a maximum of 10 mins to gracefully shutdown. From reading through it sounds like there is no way to configure this limit.