kubernetes: preStop hooks throwing FailedPreStopHook warning events
Kubernetes version (use kubectl version
): 1.4.5
Environment:
- Cloud provider or hardware configuration: minikube, aws
- OS (e.g. from /etc/os-release): boot2docker (alpine), Debian Jessie
- Kernel (e.g.
uname -a
): 4.4.14-boot2docker, 4.4.26-k8s - Install tools: minikube, kops
- Others:
What happened:
I’m trying to use preStop
for the purpose of sending custom signals to containers that do not handle SIGTERM
gracefully. The use case is nearly identical to what’s described under Lifecycle hooks and termination notice. The challenge I’m running into (which I believe applies to the example) is that SIGTERM
is sent immediately after the preStop
hook finishes execution. This means that sending a signal to the container (e.g. via nginx -s quit
, or pkill -QUIT -f unicorn_rails
) will result in SIGTERM
being nearly immediately sent to the container afterward without the process having time to react. The result is an ungraceful shutdown and forcibly terminated connections.
To work around this, I attempted to add a sleep
in the preStop
command:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.11
imagePullPolicy: Always
name: nginx
lifecycle:
preStop:
exec:
command: ["bash", "-c", "/usr/sbin/nginx -s quit ; sleep 29"]
restartPolicy: Always
This works great, except that it now throws a FailedPreStopHook
warning event:
1m 1m 1 nginx-616817687-5ue6l Pod Normal Scheduled {default-scheduler } Successfully assigned nginx-616817687-5ue6l to minikube
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Normal Pulling {kubelet minikube} pulling image "nginx:1.11"
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Normal Pulled {kubelet minikube} Successfully pulled image "nginx:1.11"
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Normal Created {kubelet minikube} Created container with docker id 27a1db055710; Security:[seccomp=unconfined]
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Normal Started {kubelet minikube} Started container with docker id 27a1db055710
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Warning FailedPreStopHook {kubelet minikube}
1m 1m 1 nginx-616817687-5ue6l Pod spec.containers{nginx} Normal Killing {kubelet minikube} Killing container with docker id 27a1db055710: Need to kill pod.
When deleting the pod, the kubelet logs show msgs like:
E1222 20:58:04.915004 3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: Error executing in Docker Container: 137
I1222 20:58:04.973272 3157 operation_executor.go:803] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/55393339-c889-11e6-9614-0a874283f159-default-token-queh6" (spec.Name: "default-token-queh6") pod "55393339-c889-11e6-9614-0a874283f159" (UID: "55393339-c889-11e6-9614-0a874283f159").
E1222 20:58:05.093857 3157 docker_manager.go:746] Logging security options: {key:seccomp value:unconfined msg:}
I1222 20:58:05.251345 3157 docker_manager.go:2162] Determined pod ip after infra change: "nginx-616817687-5ue6l_default(55393339-c889-11e6-9614-0a874283f159)": "172.17.0.2"
W1222 20:58:05.491270 3157 docker_manager.go:1537] No ref for pod '"8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u"'
E1222 20:58:05.491272 3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: container not running (8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628)
W1222 20:58:05.491387 3157 docker_manager.go:1552] No ref for pod '"nginx-1677451563-nl78u"'
W1222 20:58:05.492528 3157 docker_manager.go:1537] No ref for pod '"11d757dea91f1d6ccff888cb576bbf3c83ad0845870ec064535341c893286f4a default/nginx-1677451563-nl78u"'
I1222 20:58:05.673675 3157 reconciler.go:188] UnmountVolume operation started for volume "kubernetes.io/secret/e3fab6bd-c888-11e6-9614-0a874283f159-default-token-queh6" (spec.Name: "default-token-queh6") from pod "e3fab6bd-c888-11e6-9614-0a874283f159" (UID: "e3fab6bd-c888-11e6-9614-0a874283f159").
I1222 20:58:05.701340 3157 operation_executor.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/e3fab6bd-c888-11e6-9614-0a874283f159-default-token-queh6" (OuterVolumeSpecName: "default-token-queh6") pod "e3fab6bd-c888-11e6-9614-0a874283f159" (UID: "e3fab6bd-c888-11e6-9614-0a874283f159").
InnerVolumeSpecName "default-token-queh6". PluginName "kubernetes.io/secret", VolumeGidValue ""
E1222 20:58:06.084941 3157 docker_manager.go:1502] preStop hook for container "8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628 nginx default/nginx-1677451563-nl78u" failed: container not running (8aa630583d4d6e5c3491bc2f7ee82e021f8301686a2f6c597fca72ba6bf6e628)
Notably, it says that the hook failed because the container is not running. The exit code of 137 indicates a SIGKILL was sent somewhere along the line, perhaps as a result of PID1 being killed from within the preStop
.
Ultimately, this works - containers appear to be gracefully shut down when a sleep is provided. However, the requirement for the sleep is undocumented and surprising. There are many alternative ways to work around the problem of handling SIGTERM
(write a bash wrapper around the startup command or set STOPSIGNAL
in the Dockerfile
), but preStart
seems specifically designed to handle this case and I’d love to see it work more gracefully.
What you expected to happen:
- The
preStop
hook should be able to trigger a container stop without errors occurring. - The (documented) behavior of
SIGTERM
being sent immediately afterpreStop
completes makespreStop
challenging to use for the example use case of handling signals. Perhaps a configurable delay could be introduced sosleep
is not required. - The documentation, which describes this use case should be updated to correctly handle graceful shutdown behavior, which requires time for the PID to quiesce.
How to reproduce it (as minimally and precisely as possible):
-
Apply a deployment as in the YAML ☝️
-
kubectl delete pod
the created pod -
Observe that
FailedPreStopHook
warnings occur inkubectl get events
Anything else do we need to know:
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 23
- Comments: 18 (4 by maintainers)
Commits related to this issue
- Use dumb-init for ovndb/ovncontroller pods In these pods we are using bash scripts as entrypoint and with these SIGTERM signals are not forwarded to child processes, Use dumb-init to handle graceful ... — committed to karelyatin/ovn-operator by karelyatin 7 months ago
- Use dumb-init for ovndb/ovncontroller pods In these pods we are using bash scripts as entrypoint and with these SIGTERM signals are not forwarded to child processes, Use dumb-init to handle graceful ... — committed to karelyatin/ovn-operator by karelyatin 7 months ago
Hi, any news here?
does not work on k8s 1.9:
I can confirm that this works properly in k8s 1.11.6 (built with kops 1.11 on AWS). As long as the
terminationGracePeriod
for the pod is longer than the duration of thesleep
interval everything exits gracefully now. Many thanks to @SpeedVan.We’re experiencing the same thing in 1.9.
The common use case here is wanting to send a graceful exit signal and wait for a process to complete before returning the
preStop
hook.As noted above, the container kills (SIGTERM 137) the process running the preStop hook before the hook is able to return.
/remove-lifecycle rotten /lifecycle frozen
Removing lifecycle stagnation, as this is still being reported in 1.9.
This hurts signal-to-noise when auditing for behavioral concerns in prod environments.