kubernetes: kubectl logs should *not* fail while a pod is being started, because it makes for a terrible initial experience
Currently the following can fail:
$ kubectl create -f pod.yaml
pod "foo" created
$ kubectl logs -f foo
container "build" in pod "build" is waiting to start: ContainerCreating
For real world cases where it may take tens of seconds to pull the image, this is intensely frustrating to users because now they have to guess when the pod is started in order to debug it.
For users who specify kubectl logs foo
, it may be ok to return immediately (we need to debate this). -f
is different because the expectation is to wait to see the results of the pod execution.
We need to fix the kubelet to return errors that allow clients to make the determination (probably by improving the types of errors returned by validateContainerLogStatus
to include structured API errors) and ensure that the pod log (and other log endpoints, like deploy logs) can properly handle surfacing that. Then we either need to change the logs endpoint for follow to have that behavior (reasonable) or change the clients (more work, harder to justify). The other log endpoints should be have consistently.
We need to keep backwards compatibility in account on errors - right now we always return 400 Bad Request, but we could add detail causes. The GenericHTTPResponseChecker would probably need to be wrapped with a more powerful PossibleAPIOrGenericHTTPResponseChecker that tested for JSON output and did the right thing.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 85
- Comments: 30 (11 by maintainers)
Thanks @stszap.
We found the
Initialized
condition was not ready enough to be able to start watching the logs, butContainersReady
is.It turns out waiting on the pod to start can be done by using the automatically provided
job-name
as the selector. I’m not sure whether that’s GKE-specific though.I haven’t investigated whether running
kubectl logs --follow job/db-migrate
in the foreground is reliable enough to determine that the job has finished; so we run it in the background and kill it once the job’s done. But this has the side-effect of not exiting the script if the job fails - so more investigation is required.Here’s basically what we’re using to run a job - as
scripts/migrate-db
in order to have our database migrations run as a separate Buildkite pipeline step before deploying the rest of the app:CC @jpaulgs, @yob, @ceralena
Update: here is our new
kube-run-pod
script. In the above, the ContainersReady waiting fails in the rare case that the cronjob hasn’t yet created a pod. From that, we realised we didn’t need it to be a job anyway - creating and deleting a pod is sufficient.Making this p2 because I can’t yet justify it, but this is intensely irritating to new users.
100% agree with @smarterclayton - this is immenselly irritating, and I would add completely ridiculous that it has seen hardly any attention since 2016. Workarounds are possible, but should each team dedicate time to deploying a workaround to make essential functionality work? Or is there something wrong with a workflow where one expects to be able to view the status and output of a pod as soon as it’s deployed? It’s just kind of odd that this isn’t a problem for more people - am I using Kubernetes wrong?
@mfojtik @kargakis - has there been any movement on this issue?
kubectl logs
has the--pod-running-timeout=90s
flag but it doesn’t appear to have any effect. It appears that the only workaround is an involved Python script that checks for the job status and then polls until it starts receiving logs… it feels likekubectl logs
should support this out of the box.This is the bad experience that @smarterclayton mentioned:
None of the above solutions work…
kubectl logs
doesn’t wait and returns an error (really frustrating!)Initialized
returns too early and you may get an error from subsequentkubectl logs
:Error from server (BadRequest): container "job-proc" in pod is waiting to start: ContainerCreating
ContainersReady
gets stuck forever if the job fails (a container may never get ready… try to run a job with a simple failing command likecat notexists.txt
)complete
has the same issue as (3)It’s really a pain because we need a simple way to display the logs of the Job while it is executing (or at least when it ends, regardless of success or failure).
I’ve had a look around and am working on a solution for this.
/assign
There are changes necessary in the kubelet API as well so
/sig node
@jeff1985 I use rollout status when update deployments and want to wait until the new version is up (e.g
kubectl rollout status --timeout=5m deploy/mydeployment
). There is also wait command (e.g.kubectl wait --for=condition=Ready --timeout=5m po/mypod
). You can wait for a job to complete by runningkubectl wait --for=condition=Complete job/myjob
beforekubectl logs
but the job has to exist forkubectl wait
to work and additional tests may be necessary for that. If you want to see the logs in real time then you can add a label to job’s pod template and use something like thiskubectl wait --for=condition=Initialized po -l label=value
.I’m considering queueing this up for 1.12 - I want to make a few targeted improvements across the board. Need to talk about it in sig-cli
On May 14, 2018, at 11:59 PM, Bruno Bowden notifications@github.com wrote:
@mfojtik https://github.com/mfojtik @kargakis https://github.com/kargakis - has there been any movement on this issue?
kubectl logs has the --pod-running-timeout=90s flag but it doesn’t appear to have any effect. It appears that the only workaround is an involved Python script that checks for the job status and then polls until it starts receiving logs… it feels like kubectl logs should support this easily.
This is the bad experience that @smarterclayton https://github.com/smarterclayton mentioned:
$ kubectl logs -f xxxx-0 Error from server (BadRequest): container “xxxx-0” in pod “xxxx-0-6cqdt” is waiting to start: ContainerCreating
coreutils’
tail
has the following. However it’s implemented, it would be super nice ifkubectl logs
had similar.Is there a workaround for this problem? I currently have a usecase, where I’m trying to catch the output of a started Job in CI. And I was thinking, that I could at least wait for the “active” status of the job by this the following shell script:
But no! It is indeed so that the Job is already marked as active, while the pod is still creating. This makes the logs command unusable in CI!
Coming here to note, too, that the suggested workaround - waiting for
ContainersReady
- doesn’t work when debugging container startup issues. In my case, I have a one-off job I’m trying to run by just creating a pod manually, and because it crashes it never reaches theContainersReady
state. Not sure if there’s a different state I could wait for, but it would be nice if I didn’t have to…