kubernetes: log rotation on node breaks kubectl logs --follow / docker logs --follow

Hello,

There is an upstream problem with docker’s json-file log driver. The command ‘docker logs --follow’ (and subsequently ‘kubectl logs --follow’) stops generating log output when the node that hosts the given container rotates the container logs.

Environment:

Running kubernetes 1.2.4 on AWS.

Impact

logspout or other tools that rely on ‘docker logs --follow’ / ‘kubectl logs --follow’ break during log rotation.

Steps to reproduce:

  • Launch a pod that generates regular stdout.
  • Launch kubectl logs --tail POD for the pod above.
  • Instead of waiting for log rotation, force it: SSH into the node that hosts that pod/container, and run sudo /usr/sbin/logrotate --force /etc/logrotate.conf
  • Observe output stop. Restarting the kubectl logs command will resume log streaming.

To reproduce at the docker level, repeat the procedure above using ‘docker logs’ on the node, e.g.:

sudo DOCKER_HOST=unix:///var/run/docker.sock docker logs --tail=0 --follow CONTAINER_ID

Environment details

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}

docker version:

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 12:59:02 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 12:59:02 UTC 2015
 OS/Arch:      linux/amd64

docker info:

Containers: 17
Images: 249
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 283
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 2
Total Memory: 7.815 GiB
Name: XXXXXXXXXXXXXXXXX
ID: 5MWW:M4LM:3UIM:5UOC:7AQD:DSWA:MIAA:WBFF:EMHR:2DH6:ITQC:QBVW
WARNING: No swap limit support

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 6
  • Comments: 16 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Why is this issue closed? It still appears to be a problem for many people (including myself). Every time I try to run kubectl logs -f, the output stops after about 10 minutes.

I’m using the latest version of kubectl that I installed today with gcloud components install kubectl

My environment looks like this (if it makes any difference):

$ gcloud info
Google Cloud SDK [266.0.0]

Platform: [Cygwin, x86_64] ('CYGWIN_NT-6.3-9600', 'ThinkPadW530', '3.0.7-338.x86_64', '2019-04-30 18:08 UTC', 'x86_64', '')
Locale: ('en_US', 'UTF-8')
Python Version: [2.7.16 (default, Mar 20 2019, 12:15:19)  [GCC 7.4.0]]
Python Location: [/usr/bin/python2]
Site Packages: [Disabled]

Installation Root: [/cygdrive/c/Users/Alex/AppData/Local/Google/Cloud SDK/google-cloud-sdk]
Installed Components:
  core: [2019.10.04]
  app-engine-python: [1.9.86]
  kubectl: [2019.09.22]
  beta: [2019.05.17]
  gsutil: [4.44]
  bq: [2.0.48]
  cloud-datastore-emulator: [2.1.0]
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:31:45Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.9-gke.15", GitCommit:"b48a8d693e191192e27c2f807daa51b54d0b0a61", GitTreeState:"clean", BuildDate:"2019-08-12T17:49:30Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

Who do we need to bribe to get this reopened?

Why was this issue closed?

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

I came here looking for a fix for this problem, just to be disappointed as there’s no fix, and the issue was autoclosed by the bot. 😦

Here is my workaround in NodeJS for Kontinuous (as CI/CD tool we’re developping), the implementation is very specific (and relative to jobs tailing) but the principles can be decoupled for general usage, I share it if this can help someone https://github.com/SocialGouv/kontinuous/blob/fe7345c5e33bda5990b08e2c60e3c997fc1c7a7d/packages/common/utils/kubejob-tail.js

This is happening on GKE master version 1.13.6-gke.13 and a GKE node version of 1.12.8-gke.7 which has a docker version of 17.03.2-ce. Based on https://github.com/moby/moby/issues/37646 it seems like this should work since based on the hostconfig.json, docker is running with --max-size=10m --max-files=5, but it doesn’t seem to.

Does the kubernetes API server tail through docker, or through the logs in /var/lib/pods/? If it’s the latter, since the log in that directory is just a symlink, and docker log rotate does a rename/create, the issue might be that the API server doesn’t listen to changes on the symlink to pick up the new file.

EDIT: Based on https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1218 it looks like it does end up hitting the Docker daemon directly for streaming logs 🤔