sumologic-kubernetes-collection: Fluentbit not running after upgrading to K8s 1.20
Cluster- EKS 1.20 Sumologic chart - 2.1.1 Fluentbit version - 1.6.10
Hi Team,
After upgrading the k8s cluster to 1.20 , fluentbit failed to start as liveness/readiness probe is failing on port 2020
Below is pod events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned sumologic/sumologic-collector-fluent-bit-lpzck to ip-10-3-91-0.ec2.internal
Normal Killing 14m (x2 over 15m) kubelet Container fluent-bit failed liveness probe, will be restarted
Normal Pulled 14m (x3 over 15m) kubelet Container image "public.ecr.aws/sumologic/fluent-bit:1.6.10" already present on machine
Normal Created 14m (x3 over 15m) kubelet Created container fluent-bit
Normal Started 14m (x3 over 15m) kubelet Started container fluent-bit
Warning Unhealthy 14m (x8 over 15m) kubelet Readiness probe failed: Get "http://10.3.90.226:2020/": dial tcp 10.3.90.226:2020: connect: connection refused
Warning BackOff 5m43s (x28 over 12m) kubelet Back-off restarting failed container
Warning Unhealthy 48s (x25 over 15m) kubelet Liveness probe failed: Get "http://10.3.90.226:2020/": dial tcp 10.3.90.226:2020: connect: connection refused
There is no error in logs and before the upgrade it was working fine. Please let me know in case more information is required.
Thanks, Hussain
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 26 (12 by maintainers)
@pmalek-sumo I can confirm that this isssue is fixed with bottlrocket os 1.1.3. Thanks a lot for your help on this 😃
Thanks all for chipping in.
@cbuto The reason it worked on
1.3.5
seems to be that on that version we were using a different fluent-bit chart from https://charts.helm.sh/stable (which is now deprecated) which doesn’t use liveness/readiness probes.It is defined in here on our collection chart - this translates to usage of the following daemonset template.
We could try verifying this on different versions of fluent-bit but I believe there’s no use in that since this bug is about using 1.6.10 but the referenced issue fluent/helm-charts#120 mentions 1.7.5.
My suggestion at this point would be to observe the fluent-bit issue filed at https://github.com/fluent/fluent-bit/issues/3521 and as a work around I suggest to disable the probes in with these chart options as indicated by @hussainsaify in https://github.com/SumoLogic/sumologic-kubernetes-collection/issues/1638#issuecomment-856695764