kubernetes: Documentation on what constitutes a failed container with respect to restart policy is confusing
/kind bugs
What happened: I have a deployment that I would like to process some data, exit, and restart. It’s important to the application that it has a clean environment every time. I don’t need to impose any kind of scheduling or anything on it, just run and loop.
After reading these docs:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
In particular:
Failed Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes
I assumed that, as long as my process exits successfully (i.e, exit status 0), kubernetes wouldn’t be silly enough to consider it “failed”. But, of course, it does, so my process ends up in a CrashLoopBackoff state. I didn’t think kubernetes was silly, so I spent a lot of time trying to diagnose why my process was considered “failed” (from making sure it was actually returning 0 in various ways, to just issues on GitHub, like this: https://github.com/kubernetes/kubernetes/issues/50962 )
Other documentation seems to indicate that a container that exits 0 should not be considered “failed”: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
Failed: All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes :
The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success
Yet, in the on github issue I’ve seen that touches on this: https://github.com/kubernetes/kubernetes/issues/50962
This is intentional, as @jianzhangbjz noted. I think a cron job may be better suited for you.
Seems to indicate that, no, if a container exits 0, it 's in a “Failed” state, and the backoff applies. This aligns with my experience, but not the documentation.
What you expected to happen:
I expect to be able to read the documentation and clearly understand what happens when a container exits with a clean return code ( 0 ), and a restart policy of Always.
How to reproduce it (as minimally and precisely as possible):
Launch a container that runs, processes data, and exits 0, with a restart policy of always. Watch it go into a CrashLoopBackoff state, and then try to understand why by reading the kubernetes documentation. You will find that the documentation is unclear at best, and misleading at worst.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 18 (7 by maintainers)
This issue was closed by a bot and moved by automation to Done. This continues to be an ambiguous bug/feature that needs addressing, at least in docs.
/reopen
Has / will this ever be addressed? Seems really odd. For example, we have microservices that do x amount of iterations then finish, and restart. Definite use case to have exit codes interpreted correctly.