kubernetes: preStop hooks throwing FailedPreStopHook warning events

Kubernetes version (use kubectl version): 1.4.5

Environment:

  • Cloud provider or hardware configuration: minikube, aws
  • OS (e.g. from /etc/os-release): boot2docker (alpine), Debian Jessie
  • Kernel (e.g. uname -a): 4.4.14-boot2docker, 4.4.26-k8s
  • Install tools: minikube, kops
  • Others:

What happened:

I’m trying to use preStop for the purpose of sending custom signals to containers that do not handle SIGTERM gracefully. The use case is nearly identical to what’s described under Lifecycle hooks and termination notice. The challenge I’m running into (which I believe applies to the example) is that SIGTERM is sent immediately after the preStop hook finishes execution. This means that sending a signal to the container (e.g. via nginx -s quit, or pkill -QUIT -f unicorn_rails) will result in SIGTERM being nearly immediately sent to the container afterward without the process having time to react. The result is an ungraceful shutdown and forcibly terminated connections.

To work around this, I attempted to add a sleep in the preStop command:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:1.11
        imagePullPolicy: Always
        name: nginx
        lifecycle:
          preStop:
            exec:
              command: ["bash", "-c", "/usr/sbin/nginx -s quit ; sleep 29"]
      restartPolicy: Always

This works great, except that it now throws a FailedPreStopHook warning event:

1m        1m        1         nginx-616817687-5ue6l    Pod                                   Normal    Scheduled           {default-scheduler }       Successfully assigned nginx-616817687-5ue6l to minikube
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Normal    Pulling             {kubelet minikube}         pulling image "nginx:1.11"
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Normal    Pulled              {kubelet minikube}         Successfully pulled image "nginx:1.11"
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Normal    Created             {kubelet minikube}         Created container with docker id 27a1db055710; Security:[seccomp=unconfined]
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Normal    Started             {kubelet minikube}         Started container with docker id 27a1db055710
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Warning   FailedPreStopHook   {kubelet minikube}
1m        1m        1         nginx-616817687-5ue6l    Pod          spec.containers{nginx}   Normal    Killing             {kubelet minikube}         Killing container with docker id 27a1db055710: Need to kill pod.

When deleting the pod, the kubelet logs show msgs like:

E1222 20:58:04.915004    3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: Error executing in Docker Container: 137
I1222 20:58:04.973272    3157 operation_executor.go:803] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/55393339-c889-11e6-9614-0a874283f159-default-token-queh6" (spec.Name: "default-token-queh6") pod "55393339-c889-11e6-9614-0a874283f159" (UID: "55393339-c889-11e6-9614-0a874283f159").
E1222 20:58:05.093857    3157 docker_manager.go:746] Logging security options: {key:seccomp value:unconfined msg:}
I1222 20:58:05.251345    3157 docker_manager.go:2162] Determined pod ip after infra change: "nginx-616817687-5ue6l_default(55393339-c889-11e6-9614-0a874283f159)": "172.17.0.2"
W1222 20:58:05.491270    3157 docker_manager.go:1537] No ref for pod '"8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u"'
E1222 20:58:05.491272    3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: container not running (8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628)
W1222 20:58:05.491387    3157 docker_manager.go:1552] No ref for pod '"nginx-1677451563-nl78u"'
W1222 20:58:05.492528    3157 docker_manager.go:1537] No ref for pod '"11d757dea91f1d6ccff888cb576bbf3c83ad0845870ec064535341c893286f4a default/nginx-1677451563-nl78u"'
I1222 20:58:05.673675    3157 reconciler.go:188] UnmountVolume operation started for volume "kubernetes.io/secret/e3fab6bd-c888-11e6-9614-0a874283f159-default-token-queh6" (spec.Name: "default-token-queh6") from pod "e3fab6bd-c888-11e6-9614-0a874283f159" (UID: "e3fab6bd-c888-11e6-9614-0a874283f159").
I1222 20:58:05.701340    3157 operation_executor.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/e3fab6bd-c888-11e6-9614-0a874283f159-default-token-queh6" (OuterVolumeSpecName: "default-token-queh6") pod "e3fab6bd-c888-11e6-9614-0a874283f159" (UID: "e3fab6bd-c888-11e6-9614-0a874283f159").
InnerVolumeSpecName "default-token-queh6". PluginName "kubernetes.io/secret", VolumeGidValue ""
E1222 20:58:06.084941    3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: container not running (8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628)

Notably, it says that the hook failed because the container is not running. The exit code of 137 indicates a SIGKILL was sent somewhere along the line, perhaps as a result of PID1 being killed from within the preStop.

Ultimately, this works - containers appear to be gracefully shut down when a sleep is provided. However, the requirement for the sleep is undocumented and surprising. There are many alternative ways to work around the problem of handling SIGTERM (write a bash wrapper around the startup command or set STOPSIGNAL in the Dockerfile), but preStart seems specifically designed to handle this case and I’d love to see it work more gracefully.

What you expected to happen:

  • The preStop hook should be able to trigger a container stop without errors occurring.
  • The (documented) behavior of SIGTERM being sent immediately after preStop completes makes preStop challenging to use for the example use case of handling signals. Perhaps a configurable delay could be introduced so sleep is not required.
  • The documentation, which describes this use case should be updated to correctly handle graceful shutdown behavior, which requires time for the PID to quiesce.

How to reproduce it (as minimally and precisely as possible):

  • Apply a deployment as in the YAML ☝️

  • kubectl delete pod the created pod

  • Observe that FailedPreStopHook warnings occur in kubectl get events

Anything else do we need to know:

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 23
  • Comments: 18 (4 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, any news here?

          lifecycle:
            preStop:
              exec:
                command: ["sleep", "120"]

does not work on k8s 1.9:

Normal   Started                34m   kubelet, ip-xxx.internal  Started container
Warning  FailedPreStopHook      3s    kubelet, ip-xxx.internal  Exec lifecycle hook ([sleep 120]) for Container "..." in Pod "..." failed - error: command 'sleep 120' exited with 137: , message: ""
Normal   Killing                3s    kubelet, ip-xxx.internal  Killing container with id docker://...:Need to kill Pod

I can confirm that this works properly in k8s 1.11.6 (built with kops 1.11 on AWS). As long as the terminationGracePeriod for the pod is longer than the duration of the sleep interval everything exits gracefully now. Many thanks to @SpeedVan.

We’re experiencing the same thing in 1.9.

The common use case here is wanting to send a graceful exit signal and wait for a process to complete before returning the preStop hook.

As noted above, the container kills (SIGTERM 137) the process running the preStop hook before the hook is able to return.

/remove-lifecycle rotten /lifecycle frozen

Removing lifecycle stagnation, as this is still being reported in 1.9.

This hurts signal-to-noise when auditing for behavioral concerns in prod environments.