kubernetes: ContainerLogMaxFiles not being honored for dead containers

What happened: When a pod gets restarted because of a failed livenessProbe or similar the setting of ContainerLogMaxFiles is not getting honored.

Here is an ls of the log directory of my test pod which has a intentionally failing livenessProbe:

total 1.7G
-rw-r----- 1 root root 236M Apr 21 08:25 15.log.20200421-082527
-rw-r----- 1 root root 191M Apr 21 07:54 1.log.20200421-075440
-rw-r----- 1 root root 153M Apr 21 07:54 0.log.20200421-075407
-rw-r----- 1 root root 148M Apr 21 08:31 16.log.20200421-083105
-rw-r----- 1 root root 128M Apr 21 07:57 5.log.20200421-075719
-rw-r----- 1 root root 120M Apr 21 08:31 17.log.20200421-083129
-rw-r----- 1 root root  98M Apr 21 08:37 18.log.20200421-083700
-rw-r----- 1 root root  94M Apr 21 08:43 21.log.20200421-084336
-rw-r----- 1 root root  59M Apr 21 08:48 22.log.20200421-084858
-rw-r----- 1 root root  49M Apr 21 08:43 20.log.20200421-084306
-rw-r----- 1 root root  44M Apr 21 08:55 25.log.20200421-085523
-rw-r----- 1 root root  43M Apr 21 08:37 19.log.20200421-083715
-rw-r----- 1 root root  35M Apr 21 08:54 24.log.20200421-085450
-rw-r----- 1 root root  20M Apr 21 08:49 23.log.20200421-084920
-rw-r--r-- 1 root root  19M Apr 21 07:56 4.log.20200421-075558.gz
-rw-r--r-- 1 root root  19M Apr 21 08:13 11.log.20200421-081316.gz
-rw-r--r-- 1 root root  19M Apr 21 08:07 9.log.20200421-080716.gz
-rw-r--r-- 1 root root  18M Apr 21 08:19 13.log.20200421-081915.gz
-rw-r--r-- 1 root root  17M Apr 21 08:25 14.log.20200421-082448.gz
-rw-r--r-- 1 root root  17M Apr 21 07:58 6.log.20200421-075750.gz
-rw-r--r-- 1 root root  16M Apr 21 07:55 3.log.20200421-075525.gz
-rw-r--r-- 1 root root  15M Apr 21 08:07 8.log.20200421-080647.gz
-rw-r--r-- 1 root root  15M Apr 21 07:55 2.log.20200421-075454.gz
-rw-r--r-- 1 root root  14M Apr 21 07:54 1.log.20200421-075423.gz
-rw-r--r-- 1 root root  13M Apr 21 07:54 0.log.20200421-075353.gz
-rw-r--r-- 1 root root  13M Apr 21 08:01 7.log.20200421-080104.gz
-rw-r--r-- 1 root root  11M Apr 21 08:19 12.log.20200421-081847.gz
-rw-r--r-- 1 root root 9.0M Apr 21 08:13 10.log.20200421-081247.gz
-rw-r--r-- 1 root root 6.0M Apr 21 08:31 16.log.20200421-083050.gz
-rw-r--r-- 1 root root 5.0M Apr 21 07:57 6.log.20200421-075735.gz
-rw-r--r-- 1 root root 4.9M Apr 21 08:31 17.log.20200421-083118.gz
-rw-r--r-- 1 root root 4.3M Apr 21 08:37 18.log.20200421-083646.gz
-rw-r--r-- 1 root root 3.7M Apr 21 07:57 5.log.20200421-075708.gz
-rw-r--r-- 1 root root 2.4M Apr 21 08:06 8.log.20200421-080633.gz
-rw-r--r-- 1 root root 2.0M Apr 21 08:24 14.log.20200421-082434.gz
-rw-r--r-- 1 root root 1.8M Apr 21 07:53 0.log.20200421-075342.gz
-rw-r--r-- 1 root root 1.6M Apr 21 08:43 21.log.20200421-084316.gz
drwxr-xr-x 2 root root 4.0K Apr 21 11:44 .
drwxr-xr-x 3 root root 4.0K Apr 21 07:53 ..
-rw-r----- 1 root root    0 Apr 21 11:44 81.log

Only for the currently running pod this setting was honored, when I delete the deployment and recreate without the livenessProbe there are at most 5 files in the directory at all times, which is the default setting.

What you expected to happen: I expected there to be at max only the logs of the previous instance and not logs of many of the previous instances.

How to reproduce it (as minimally and precisely as possible): Create a deployment which will continuusly print to stdout check how many log files are created at most. Now add a livenessProbe which will fail intentionally and you should see above behaviour of all previous logs being retained.

Anything else we need to know?: We are using the default values for ContainerLogMaxSize and ContainerLogMaxFiles.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:13:57Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:05:26Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: GCP with n1 machine types (in this case: 1 master, 1 node, and 1 etcd)
OS (e.g: cat /etc/os-release):

NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Kernel (e.g. uname -a):

Linux minion0 5.0.0-1034-gcp #35-Ubuntu SMP Tue Mar 17 03:56:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 74 (66 by maintainers)

Most upvoted comments

Adding test.

PR coming Monday.

There is optimization where, in the loop of containerLogManager#pruneDeadContainerLogs, if there are not as many as containersToKeep containers (for given pod), we can skip to next pod.

tedyu on May 16, 2020