kubernetes: Pod logs: long lines are corrupted when using timestamps=true

What happened:

Docker has a hardcoded 16K limit on log line length, which splits lines longer than that into multiple messages in the log file. ReadLogs in logs.go, however, doesn’t account for this when timestamps is set to true and prints the timestamp in front of every entry in the log file instead of every logical line. This leads to random timestamps appearing in the middle of log lines, potentially breaking the applications that parse these logs.

What you expected to happen:

ReadLogs inserts timestamps once per logical line, so original log data is preserved. Alternatively, allow /logs API to request the logs in a sane format like JSON, which would allow merging the lines back and combining them with the timestamps more intelligently on the application side.

How to reproduce it (as minimally and precisely as possible):

Create foo.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: log-lines
spec:
  containers:
  - name: main
    image: busybox
    command: ['sh', '-c', 'while sleep 1; do echo -n ___; for i in $(seq 1 18000); do echo -n A; done; echo ___; done']

Start the pod with kubectl apply -f foo.yaml
Fetch the logs with kubectl logs log-lines --timestamps, or using the API (curl https://…/api/v1/namespaces/default/pods/log-lines/log?timestamps=true&container=main)
Observe random timestamps inserted in the middle of the log lines.

Environment:

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 6
Comments: 41 (19 by maintainers)

Commits related to this issue

add retry logic to receptor logstream on kube resume kube job log using kube log timestamp require fix for https://github.com/kubernetes/kubernetes/issues/77603 in https://github.com/kubernetes/kube... — committed to TheRealHaoLiu/receptor by TheRealHaoLiu 2 years ago
add retry logic to receptor logstream on kube resume kube job log using kube log timestamp require fix for https://github.com/kubernetes/kubernetes/issues/77603 in https://github.com/kubernetes/kube... — committed to TheRealHaoLiu/receptor by TheRealHaoLiu 2 years ago
add retry logic to receptor logstream on kube resume kube job log using kube log timestamp require fix for https://github.com/kubernetes/kubernetes/issues/77603 in https://github.com/kubernetes/kube... — committed to ansible/receptor by TheRealHaoLiu 2 years ago

Most upvoted comments

I received a bug report about this issue. The root cause is that the buffer reads up to a new line. If the line is greater than 4096 bytes and ‘–timestamps’ is turrned on the kubelet will write the timestamp and the partial log line. We will need to refactor the ReadLogs function to allow for a partial line read.

https://github.com/kubernetes/kubernetes/blob/f892ab1bd7fd97f1fcc2e296e85fdb8e3e8fb82d/pkg/kubelet/kuberuntime/logs/logs.go#L325

rphillips on Oct 31, 2022

@aermakov-zalando I will work on it.

/sig cli /remove-sig node

YoubingLi on May 9, 2019