sumologic-kubernetes-collection: Fluentbit not running after upgrading to K8s 1.20

Cluster- EKS 1.20 Sumologic chart - 2.1.1 Fluentbit version - 1.6.10

Hi Team,

After upgrading the k8s cluster to 1.20 , fluentbit failed to start as liveness/readiness probe is failing on port 2020

Below is pod events

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  15m                   default-scheduler  Successfully assigned sumologic/sumologic-collector-fluent-bit-lpzck to ip-10-3-91-0.ec2.internal
  Normal   Killing    14m (x2 over 15m)     kubelet            Container fluent-bit failed liveness probe, will be restarted
  Normal   Pulled     14m (x3 over 15m)     kubelet            Container image "public.ecr.aws/sumologic/fluent-bit:1.6.10" already present on machine
  Normal   Created    14m (x3 over 15m)     kubelet            Created container fluent-bit
  Normal   Started    14m (x3 over 15m)     kubelet            Started container fluent-bit
  Warning  Unhealthy  14m (x8 over 15m)     kubelet            Readiness probe failed: Get "http://10.3.90.226:2020/": dial tcp 10.3.90.226:2020: connect: connection refused
  Warning  BackOff    5m43s (x28 over 12m)  kubelet            Back-off restarting failed container
  Warning  Unhealthy  48s (x25 over 15m)    kubelet            Liveness probe failed: Get "http://10.3.90.226:2020/": dial tcp 10.3.90.226:2020: connect: connection refused

There is no error in logs and before the upgrade it was working fine. Please let me know in case more information is required.

Thanks, Hussain

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 26 (12 by maintainers)

Most upvoted comments

@pmalek-sumo I can confirm that this isssue is fixed with bottlrocket os 1.1.3. Thanks a lot for your help on this 😃

Thanks all for chipping in.

@cbuto The reason it worked on 1.3.5 seems to be that on that version we were using a different fluent-bit chart from https://charts.helm.sh/stable (which is now deprecated) which doesn’t use liveness/readiness probes.

It is defined in here on our collection chart - this translates to usage of the following daemonset template.


We could try verifying this on different versions of fluent-bit but I believe there’s no use in that since this bug is about using 1.6.10 but the referenced issue fluent/helm-charts#120 mentions 1.7.5.

My suggestion at this point would be to observe the fluent-bit issue filed at https://github.com/fluent/fluent-bit/issues/3521 and as a work around I suggest to disable the probes in with these chart options as indicated by @hussainsaify in https://github.com/SumoLogic/sumologic-kubernetes-collection/issues/1638#issuecomment-856695764