kubernetes: terminationGracePeriodSeconds greater than 10 minutes not working as expected

What happened: Pod with termination grace period of 3 hours is getting killed 10 minutes after SIGTERM

What you expected to happen: I was expecting the pod to get full 3 hours before SIGKILL is sent

How to reproduce it (as minimally and precisely as possible): Long running process with termination grace period greater than 20 minutes and delete pod it will get deleted in 10 minutes

Anything else we need to know?: From kubectl get events: 33m Normal ScaleDown pod/jarvis-6f9c9c79d6-d7vhr deleting pod for node scale down 33m Normal Killing pod/jarvis-6f9c9c79d6-d7vhr Stopping container jarvis 23m Warning FailedKillPod pod/jarvis-6f9c9c79d6-d7vhr error killing pod: failed to “KillContainer” for “jarvis” with KillContainerError: “rpc error: code = Unknown desc = operation timeout: context deadline exceeded”

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.10-gke.42”, GitCommit:“42bef28c2031a74fc68840fce56834ff7ea08518”, GitTreeState:“clean”, BuildDate:“2020-06-02T16:07:00Z”, GoVersion:“go1.12.12b4”, Compiler:“gc”, Platform:“linux/amd64”}
  • Cloud provider or hardware configuration: GKE
  • OS (e.g: cat /etc/os-release): NAME=“Ubuntu” VERSION=“18.04.5 LTS (Bionic Beaver)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 18.04.5 LTS” VERSION_ID=“18.04” HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a): Linux chakradarraju-lenovo 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

We did some digging. It looks like the nodes were getting killed by cluster autoscaler which only allows a maximum of 10 mins to gracefully shutdown. From reading through it sounds like there is no way to configure this limit.