fluent-bit: Fluent-bit get stuck after few minutes on Kubernetes 1.22
Bug Report
Describe the bug
With the new Kubernetes 1.22 when we try to run fluent-bit as a Pod it gets stuck after few minutes and does not work anymore.
If we restart the Pod manually it works again for again few minutes.
To Reproduce I reproduced it with a really simple fluent-bit pod and config
Just deployed a classic Kubernetes 1.22.1 using kubeadm with docker (and added calico as CNI)
Then created a Pod and a ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit
data:
fluent-bit.conf: |-
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Flush 1
Daemon Off
Log_Level debug
Health_Check On
[INPUT]
Name dummy
Dummy {"top": {".dotted": "value"}}
[OUTPUT]
Name stdout
---
apiVersion: v1
kind: Pod
metadata:
name: fluent-bit
spec:
containers:
- image: docker.io/fluent/fluent-bit:1.8.6
imagePullPolicy: IfNotPresent
name: fluent-bit-new
ports:
- containerPort: 2020
name: http-metrics
volumeMounts:
- mountPath: /fluent-bit/etc
name: config
volumes:
- configMap:
name: fluent-bit
name: config
The fluent-bit is really simple and just here for testing.
Everything works well and I get expected output from the Pod every second
{"log":"[0] dummy.0: [1630923559.270831593, {\"top\"=\u003e{\".dotted\"=\u003e\"value\"}}]\n","stream":"stdout","time":"2021-09-06T10:19:20.270984285Z"}
{"log":"[0] dummy.0: [1630923560.270835685, {\"top\"=\u003e{\".dotted\"=\u003e\"value\"}}]\n","stream":"stdout","time":"2021-09-06T10:19:21.271003948Z"}
And after few minutes (on my tests ~3-4 minutes) Pod get stuck and I do not get any output
Expected behavior
Fluent-bit shouldn’t be stuck after few minutes
Your Environment
- Version used: v1.8.6 (but also tested with v1.8.4 and it’s the same)
- Configuration:
[SERVICE] HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_PORT 2020 Flush 1 Daemon Off Log_Level debug Health_Check On [INPUT] Name dummy Dummy {"top": {".dotted": "value"}} [OUTPUT] Name stdout - Environment name and version (e.g. Kubernetes? What version?): Kubernetes 1.22.1
- Server type and version: VM
- Operating System and version: CentOs 7.9
- Filters and plugins:
Additional context
Note that if I downgrade kubelet to 1.21.4, for example, it works well and the fluent-bit pod does not get stuck
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 20 (5 by maintainers)
Commits related to this issue
- salt: Disable `HTTP_Server` for fluent-bit Fluent-bit `HTTP_Server` does not work for the moment with Kubernetes 1.22 so let's disable it since it only used for fluent-bit metrics. Sees: https://git... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
- salt: Disable `HTTP_Server` for fluent-bit Fluent-bit `HTTP_Server` does not work for the moment with Kubernetes 1.22 so let's disable it since it only used for fluent-bit metrics. Sees: https://git... — committed to scality/metalk8s by TeddyAndrieux 3 years ago
Note: https://github.com/fluent/fluent-bit/issues/4063#issuecomment-914463155
This data is metrics data which we can get from /api/v1/metrics.
Its format is messagepack.(\203 (=0x83 ) means fixmap which size is 3. \245 =(0xa5) means fixstr which size is 5(=input).)
collect_metrics(create metrics data in messagepack) ->flb_hs_push_pipeline_metrics->mk_mq_send->mk_fifo_send->msg_write->write(here)records\314\324means uint8(0xcc) + 212.Below is the value in JSON.
All counters indicate
"record":212. Dummy plugin didn’t ingest record.After a bit more investigation it seems linked to
HTTP_Serverwhen I disabled this one the Pod does not get stuck