image-automation-controller: "unable to clone: SSH could not read data: Error waiting on socket" after updating to v0.19
I have updated flux from version 0.18 to the latest 0.25.2 which included update of image-automation-controller from 0.15 to 0.19.
After that there was one successful commit to the repository. After that commit everything stopped and killing the pod does not help. It is still the same message all the time. I have turned on debug level but it did not provide anything more useful than this error:
{
"level": "error",
"ts": "2022-01-17T14:09:25.041Z",
"logger": "controller.imageupdateautomation",
"msg": "Reconciler error",
"reconciler group": "image.toolkit.fluxcd.io",
"reconciler kind": "ImageUpdateAutomation",
"name": "my-gitops",
"namespace": "flux-system",
"error": "unable to clone: SSH could not read data: Error waiting on socket",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"
}
I have tried to downgrade to 0.15 and then a weird thing happened: all the commits went through, but right after that a similar error appeared:
{
"level": "error",
"ts": "2022-01-17T14:13:17.047Z",
"logger": "controller-runtime.manager.controller.imageupdateautomation",
"msg": "Reconciler error",
"reconciler group": "image.toolkit.fluxcd.io",
"reconciler kind": "ImageUpdateAutomation",
"name": "my-gitops",
"namespace": "flux-system",
"error": "unable to clone 'ssh://git@bitbucket.org/myorg/my-gitops', error: SSH could not read data: Error waiting on socket"
}
Source controller does not have this trouble. I am using Bitbucket Cloud. I have seen a similar error posted elsewhere, but that one seems to be resolved by restarting the pod each time while I am not so lucky, so it’s probably best to have it as a separate error.
Is there any workaround I can use, automation all but stopped for us because of it.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (7 by maintainers)
This issue had been resolved for us up through Flux v0.27.2, but after updating to Flux v0.27.3 / image-automation-controller v0.20.1, we’re seeing this issue intermittently once again. Could be related to the fix for #316?
You can try updating to 0.26.0, that helped a lot (at least in my case) + the timeout
I think we need to change the default GitRepository timeout in source-controller,
20sit’s way to low for libgit2.