source-controller: source-controller OOM events
Describe the bug
When registering FluxCD to a repository in gitlab enterprise, I am seeing OOM activity on the source-controller pod. Removing the 1GB memory limit fixes the issues.
To Reproduce
Register fluxcd on a repo with some level of complexity, I believe.
Expected behavior
The source-controller pod should not be killed and restarted repeatedly.
Additional context
- Kubernetes version: 1.19
- Git provider: gitlab self-hosted
- Container registry provider: gitlab/ECR
Below please provide the output of the following commands:
flux --version : flux version 0.8.0
flux check
► checking prerequisites
✔ kubectl 1.19.3 >=1.18.0
✔ Kubernetes 1.19.6-eks-49a6c0 >=1.16.0
► checking controllers
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.8.1
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.8.1
✔ helm-controller: healthy
► ghcr.io/fluxcd/helm-controller:v0.7.0
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.8.0
✔ all checks passed
kubectl -n <namespace> get all
kubectl -n flux-system get all
NAME READY STATUS RESTARTS AGE
pod/helm-controller-6946b6dc7f-5nr8q 1/1 Running 0 9m34s
pod/kustomize-controller-55dfcdfd58-xj25c 1/1 Running 0 10h
pod/notification-controller-649754966b-2677x 1/1 Running 0 10h
pod/source-controller-597cc769b-lp6w4 0/1 CrashLoopBackOff 5 6m23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/notification-controller ClusterIP 10.100.114.245 <none> 80/TCP 10h
service/source-controller ClusterIP 10.100.185.20 <none> 80/TCP 10h
service/webhook-receiver ClusterIP 10.100.198.200 <none> 80/TCP 10h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helm-controller 1/1 1 1 10h
deployment.apps/kustomize-controller 1/1 1 1 10h
deployment.apps/notification-controller 1/1 1 1 10h
deployment.apps/source-controller 0/1 1 0 10h
NAME DESIRED CURRENT READY AGE
replicaset.apps/helm-controller-6779d46d69 0 0 0 10h
replicaset.apps/helm-controller-6946b6dc7f 1 1 1 9m34s
replicaset.apps/kustomize-controller-55dfcdfd58 1 1 1 10h
replicaset.apps/notification-controller-649754966b 1 1 1 10h
replicaset.apps/source-controller-555d4f9d6 0 0 0 10h
replicaset.apps/source-controller-597cc769b 1 1 0 10h
kubectl -n <namespace> logs deploy/source-controller
-- various without errors until killed ---
kubectl -n <namespace> logs deploy/kustomize-controller
-- various ---
level":"info","ts":"2021-02-24T00:06:40.724Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"istio-system","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:06:41.811Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"bookinfo","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:06:41.815Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"calico","namespace":"flux-system"}
{"level":"error","ts":"2021-02-24T00:06:41.825Z","logger":"controller.kustomization","msg":"Reconciliation failed after 1.059192016s, next try in 5m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"podinfo","namespace":"flux-system","revision"
:"master/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7","error":"failed to download artifact from http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/podinfo/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7.tar.gz, error: Get \"http://source-controller.flux-system.svc.cl
uster.local./gitrepository/flux-system/podinfo/e43ebfa5bf4b87c46f2e1db495eb571cd398e2f7.tar.gz\": dial tcp 10.100.185.20:80: connect: connection refused"}
{"level":"info","ts":"2021-02-24T00:06:41.843Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kafka","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:07:41.833Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"bookinfo","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:07:41.834Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"calico","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:07:41.853Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kafka","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:08:41.853Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"calico","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:08:41.855Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"bookinfo","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:08:41.863Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kafka","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:09:41.872Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"calico","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:09:41.874Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"bookinfo","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:09:41.875Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kafka","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:10:41.893Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"calico","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:10:41.895Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"kafka","namespace":"flux-system"}
{"level":"info","ts":"2021-02-24T00:10:41.895Z","logger":"controller.kustomization","msg":"Source is not ready, artifact not found","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"bookinfo","namespace":"flux-system"}
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 7
- Comments: 17 (5 by maintainers)
Commits related to this issue
- Update gotk-components.yaml updated source-controller deployment according to this issue: https://github.com/fluxcd/source-controller/issues/303 — committed to apatelGWS/flux2-kustomize-helm-example by apatelGWS 2 years ago
- Update gotk-components.yaml https://github.com/fluxcd/source-controller/issues/303#issuecomment-905297403 — committed to apatelGWS/flux2-kustomize-helm-example by apatelGWS 2 years ago
For large Helm repository index files, you can enable caching to reduce the memory footprint of source-controller, docs here: https://fluxcd.io/docs/cheatsheets/bootstrap/#enable-helm-repositories-caching
@kav can you please move this into a separate issue? I did a small test yesterday evening and was indeed able to apply a resource with an invalid
intervalformat, but the cluster I was testing on wasn’t running any controllers at the time so I wasn’t able to validate the crash.