image-automation-controller: image-automation-controller CrashLoopBack >= 0.20.2

Issue

Since upgrading to 0.20.2 the Image-Update-Automation Controller stopped working,

flux reconcile image update runs forever

The image-update-controller fails periodically. /readyz is not returning 200, but also provides no infos.

The service provides no Logs for the failures.

How To reproduce

In our case: update image-automation-crontroller > 0.20.1 Trigger an image update

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 6
  • Comments: 71 (25 by maintainers)

Commits related to this issue

Most upvoted comments

@saltyblu this has now been fixed as of yesterday’s release: ghcr.io/fluxcd/image-automation-controller:v0.24.0.

The second issue you mentioned will be tracked separately as part of #415.

Thank you for reporting and testing the changes. 🙇

For everybody else’s benefit, two users experiencing this issue have already confirmed the RC version worked. So we will start working towards getting this merged. Based on the upstream dependencies this may take some time to make its way into the official releases.

Please report any issues or confirm (✔️) this is working for you.

Here’s the release candidate image: ghcr.io/fluxcd/image-automation-controller:rc-41c0ddc4. It would be great if folks experiencing this issue could test it and let us know whether or not it fixes it.

I am still facing this issue despite using all the latest images.

I added some tracing and built an image using the above method. I seems to be silently crashing in https://github.com/fluxcd/image-automation-controller/blob/4d70651fbee5e06bf89e2d48a633f53a1415cd0d/controllers/imageupdateautomation_controller.go#L619

Specifically, i added these traces around it:

tracelog.Info("Fixing to try index.AddByPath(rel)", "rel", rel)
if err := index.AddByPath(rel); err != nil {
    tracelog.Info("adding file", "file", rel)
    return err
}
tracelog.Info("Got error adding file", "file", rel)
tracelog.Info("Got error adding file", "error", err) 

set the log-level to trace and the last log i got from it was as follows:

{"level":"Level(-2)","ts":"2022-04-14T15:49:36.428Z","logger":"controller.imageupdateautomation","msg":"Fixing to try index.AddByPath(rel)","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"demo","namespace":"demo","rel":".editorconfig"}

we never get to the “adding file” tracelog line in the if or to the additional traces i added after it.

to note, .editorconfig is the 1st file in an ls of the git repo.

We have reverted to 0.20.1 as well, and that seems stable so far.

@robertscherbarth I am working on a RC version with a potential fix, aiming to release it Today.

@aryan9600 We’re using AWS EKS. Before testing in a Kind setup, I have to create a more isolated setup that still reproduces the problem. I can give it a try next week.

Hi @aryan9600 here are the log output with the rc-3beb7911 Image. rc-3beb7911_trace_log.txt

Hi @aryan9600 thanks for the quick reply. In the file i shared above are all logs until the pod crashed.

@BertelBB the gitImplementation is set only for source-controller. Since version v0.21.0 the image automation controller uses only libgit2, but that should not make any changes to your setup.

Please revert to the last working version whilst I investigate this.