kubernetes: log rotation on node breaks kubectl logs --follow / docker logs --follow

Hello,

There is an upstream problem with docker’s json-file log driver. The command ‘docker logs --follow’ (and subsequently ‘kubectl logs --follow’) stops generating log output when the node that hosts the given container rotates the container logs.

Environment:

Running kubernetes 1.2.4 on AWS.

Impact

logspout or other tools that rely on ‘docker logs --follow’ / ‘kubectl logs --follow’ break during log rotation.

Steps to reproduce:

Launch a pod that generates regular stdout.
Launch kubectl logs --tail POD for the pod above.
Instead of waiting for log rotation, force it: SSH into the node that hosts that pod/container, and run sudo /usr/sbin/logrotate --force /etc/logrotate.conf
Observe output stop. Restarting the kubectl logs command will resume log streaming.

To reproduce at the docker level, repeat the procedure above using ‘docker logs’ on the node, e.g.:

sudo DOCKER_HOST=unix:///var/run/docker.sock docker logs --tail=0 --follow CONTAINER_ID

Environment details

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}

docker version:

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 12:59:02 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 12:59:02 UTC 2015
 OS/Arch:      linux/amd64

docker info:

Containers: 17
Images: 249
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 283
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 2
Total Memory: 7.815 GiB
Name: XXXXXXXXXXXXXXXXX
ID: 5MWW:M4LM:3UIM:5UOC:7AQD:DSWA:MIAA:WBFF:EMHR:2DH6:ITQC:QBVW
WARNING: No swap limit support

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 6
Comments: 16 (2 by maintainers)

Commits related to this issue

Run systests in Job (#3730) ## Motivation System tests in CI fail sporadically because `kubectl logs -f <systest pod>` exits prematurely (while tests are still running). An example of a CI run faile... — committed to spacemeshos/go-spacemesh by poszu 2 years ago
kubelet: fix handle new log file in ReadLogs after log rotation related issues: https://github.com/kubernetes/kubernetes/issues/59902 https://github.com/kubernetes/kubernetes/issues/28369 — committed to icefed/kubernetes by icefed a year ago

Most upvoted comments

Why is this issue closed? It still appears to be a problem for many people (including myself). Every time I try to run kubectl logs -f, the output stops after about 10 minutes.

I’m using the latest version of kubectl that I installed today with gcloud components install kubectl

My environment looks like this (if it makes any difference):

$ gcloud info
Google Cloud SDK [266.0.0]

Platform: [Cygwin, x86_64] ('CYGWIN_NT-6.3-9600', 'ThinkPadW530', '3.0.7-338.x86_64', '2019-04-30 18:08 UTC', 'x86_64', '')
Locale: ('en_US', 'UTF-8')
Python Version: [2.7.16 (default, Mar 20 2019, 12:15:19)  [GCC 7.4.0]]
Python Location: [/usr/bin/python2]
Site Packages: [Disabled]

Installation Root: [/cygdrive/c/Users/Alex/AppData/Local/Google/Cloud SDK/google-cloud-sdk]
Installed Components:
  core: [2019.10.04]
  app-engine-python: [1.9.86]
  kubectl: [2019.09.22]
  beta: [2019.05.17]
  gsutil: [4.44]
  bq: [2.0.48]
  cloud-datastore-emulator: [2.1.0]

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:31:45Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.9-gke.15", GitCommit:"b48a8d693e191192e27c2f807daa51b54d0b0a61", GitTreeState:"clean", BuildDate:"2019-08-12T17:49:30Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

aepshteyn on Oct 11, 2019

Who do we need to bribe to get this reopened?

shanemcd on Nov 18, 2021

Why was this issue closed?

malthe on Mar 15, 2021

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

fejta-bot on Feb 18, 2018

I came here looking for a fix for this problem, just to be disappointed as there’s no fix, and the issue was autoclosed by the bot. 😦

markusschaber on Sep 5, 2019

Here is my workaround in NodeJS for Kontinuous (as CI/CD tool we’re developping), the implementation is very specific (and relative to jobs tailing) but the principles can be decoupled for general usage, I share it if this can help someone https://github.com/SocialGouv/kontinuous/blob/fe7345c5e33bda5990b08e2c60e3c997fc1c7a7d/packages/common/utils/kubejob-tail.js

devthejo on Nov 16, 2022

This is happening on GKE master version 1.13.6-gke.13 and a GKE node version of 1.12.8-gke.7 which has a docker version of 17.03.2-ce. Based on https://github.com/moby/moby/issues/37646 it seems like this should work since based on the hostconfig.json, docker is running with --max-size=10m --max-files=5, but it doesn’t seem to.

Does the kubernetes API server tail through docker, or through the logs in /var/lib/pods/? If it’s the latter, since the log in that directory is just a symlink, and docker log rotate does a rename/create, the issue might be that the API server doesn’t listen to changes on the symlink to pick up the new file.

EDIT: Based on https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1218 it looks like it does end up hitting the Docker daemon directly for streaming logs 🤔

samschlegel on Sep 19, 2019