source-controller: `Readiness probe failed at startup` when reconciling many helm charts

Describe the bug

With 20-25 helm releases to reconcile the source-controller readinessProbe will start to fail and enter crashLoopBackOff state.

To Reproduce

Steps to reproduce the behaviour:

Make the source-controller busy by adding 20+ helm releases to reconcile. I’m sure the chart and the location of the repositories matter b/c some take longer than others.

Expected behavior

running source-controller shouldn’t failed readinessProbe

Additional context

I was able to fix the issue by setting

      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 10
  • Kubernetes version: 1.19
  • Git provider: github.com
  • Container registry provider: ECR

Below please provide the output of the following commands:

flux version 0.9.0
► checking prerequisites
✗ flux 0.9.0 <0.10.0 (new version is available, please upgrade)
✔ kubectl 1.20.4 >=1.18.0-0
✔ Kubernetes 1.17.12-eks-7684af >=1.16.0-0
► checking controllers
✔ helm-controller: healthy
► ghcr.io/fluxcd/helm-controller:v0.8.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.9.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.9.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.9.0
✔ all checks passed

pod/helm-controller-775f66d8f4-vqjgl          1/1     Running   0          23h
pod/kustomize-controller-5cb59f847c-qwqlb     1/1     Running   0          23h
pod/notification-controller-55dcddfc7-2qh9p   1/1     Running   0          23h
pod/source-controller-7f85d79d74-gdqvb        1/1     Running   2          15m

NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/notification-controller   ClusterIP   172.20.4.105     <none>        80/TCP    20d
service/source-controller         ClusterIP   172.20.178.251   <none>        80/TCP    20d
service/webhook-receiver          ClusterIP   172.20.17.200    <none>        80/TCP    20d

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/helm-controller           1/1     1            1           20d
deployment.apps/kustomize-controller      1/1     1            1           20d
deployment.apps/notification-controller   1/1     1            1           20d
deployment.apps/source-controller         1/1     1            1           20d

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/helm-controller-775f66d8f4          1         1         1       20d
replicaset.apps/kustomize-controller-5cb59f847c     1         1         1       20d
replicaset.apps/notification-controller-55dcddfc7   1         1         1       20d
replicaset.apps/source-controller-7f85d79d74        1         1         1       15m
replicaset.apps/source-controller-85c64bc47b        0         0         0       20d

other logs available upon request

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 3
  • Comments: 15 (6 by maintainers)

Most upvoted comments

I tried with Flux 0.23.0 and can not reproduce the issue.