kubernetes: ContainerLogMaxFiles not being honored for dead containers
What happened:
When a pod gets restarted because of a failed livenessProbe or similar the setting of ContainerLogMaxFiles is not getting honored.
Here is an ls of the log directory of my test pod which has a intentionally failing livenessProbe:
total 1.7G
-rw-r----- 1 root root 236M Apr 21 08:25 15.log.20200421-082527
-rw-r----- 1 root root 191M Apr 21 07:54 1.log.20200421-075440
-rw-r----- 1 root root 153M Apr 21 07:54 0.log.20200421-075407
-rw-r----- 1 root root 148M Apr 21 08:31 16.log.20200421-083105
-rw-r----- 1 root root 128M Apr 21 07:57 5.log.20200421-075719
-rw-r----- 1 root root 120M Apr 21 08:31 17.log.20200421-083129
-rw-r----- 1 root root 98M Apr 21 08:37 18.log.20200421-083700
-rw-r----- 1 root root 94M Apr 21 08:43 21.log.20200421-084336
-rw-r----- 1 root root 59M Apr 21 08:48 22.log.20200421-084858
-rw-r----- 1 root root 49M Apr 21 08:43 20.log.20200421-084306
-rw-r----- 1 root root 44M Apr 21 08:55 25.log.20200421-085523
-rw-r----- 1 root root 43M Apr 21 08:37 19.log.20200421-083715
-rw-r----- 1 root root 35M Apr 21 08:54 24.log.20200421-085450
-rw-r----- 1 root root 20M Apr 21 08:49 23.log.20200421-084920
-rw-r--r-- 1 root root 19M Apr 21 07:56 4.log.20200421-075558.gz
-rw-r--r-- 1 root root 19M Apr 21 08:13 11.log.20200421-081316.gz
-rw-r--r-- 1 root root 19M Apr 21 08:07 9.log.20200421-080716.gz
-rw-r--r-- 1 root root 18M Apr 21 08:19 13.log.20200421-081915.gz
-rw-r--r-- 1 root root 17M Apr 21 08:25 14.log.20200421-082448.gz
-rw-r--r-- 1 root root 17M Apr 21 07:58 6.log.20200421-075750.gz
-rw-r--r-- 1 root root 16M Apr 21 07:55 3.log.20200421-075525.gz
-rw-r--r-- 1 root root 15M Apr 21 08:07 8.log.20200421-080647.gz
-rw-r--r-- 1 root root 15M Apr 21 07:55 2.log.20200421-075454.gz
-rw-r--r-- 1 root root 14M Apr 21 07:54 1.log.20200421-075423.gz
-rw-r--r-- 1 root root 13M Apr 21 07:54 0.log.20200421-075353.gz
-rw-r--r-- 1 root root 13M Apr 21 08:01 7.log.20200421-080104.gz
-rw-r--r-- 1 root root 11M Apr 21 08:19 12.log.20200421-081847.gz
-rw-r--r-- 1 root root 9.0M Apr 21 08:13 10.log.20200421-081247.gz
-rw-r--r-- 1 root root 6.0M Apr 21 08:31 16.log.20200421-083050.gz
-rw-r--r-- 1 root root 5.0M Apr 21 07:57 6.log.20200421-075735.gz
-rw-r--r-- 1 root root 4.9M Apr 21 08:31 17.log.20200421-083118.gz
-rw-r--r-- 1 root root 4.3M Apr 21 08:37 18.log.20200421-083646.gz
-rw-r--r-- 1 root root 3.7M Apr 21 07:57 5.log.20200421-075708.gz
-rw-r--r-- 1 root root 2.4M Apr 21 08:06 8.log.20200421-080633.gz
-rw-r--r-- 1 root root 2.0M Apr 21 08:24 14.log.20200421-082434.gz
-rw-r--r-- 1 root root 1.8M Apr 21 07:53 0.log.20200421-075342.gz
-rw-r--r-- 1 root root 1.6M Apr 21 08:43 21.log.20200421-084316.gz
drwxr-xr-x 2 root root 4.0K Apr 21 11:44 .
drwxr-xr-x 3 root root 4.0K Apr 21 07:53 ..
-rw-r----- 1 root root 0 Apr 21 11:44 81.log
Only for the currently running pod this setting was honored, when I delete the deployment and recreate without the livenessProbe there are at most 5 files in the directory at all times, which is the default setting.
What you expected to happen: I expected there to be at max only the logs of the previous instance and not logs of many of the previous instances.
How to reproduce it (as minimally and precisely as possible): Create a deployment which will continuusly print to stdout check how many log files are created at most. Now add a livenessProbe which will fail intentionally and you should see above behaviour of all previous logs being retained.
Anything else we need to know?:
We are using the default values for ContainerLogMaxSize and ContainerLogMaxFiles.
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:13:57Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:05:26Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: GCP with n1 machine types (in this case: 1 master, 1 node, and 1 etcd)
- OS (e.g:
cat /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
- Kernel (e.g.
uname -a):
Linux minion0 5.0.0-1034-gcp #35-Ubuntu SMP Tue Mar 17 03:56:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 74 (66 by maintainers)
Adding test.
PR coming Monday.
There is optimization where, in the loop of containerLogManager#pruneDeadContainerLogs, if there are not as many as containersToKeep containers (for given pod), we can skip to next pod.