argo-cd: Sync gets stuck and must be terminated/restarted manually in order to work

Checklist:

Describe the bug

I’m trying to deploy an ArgoCD Application that contains a configmap, two certificates (Certificate custom resource from cert-manager) and a KafkaConnect instance (from Strimzi operator).

I defined the following annotations: argocd.argoproj.io/sync-wave (to make sure to have configmap and certificates before the kafkaconnect instance) and argocd.argoproj.io/sync-options on CRDs. When the application is deployed, the sync gets stuck: it keeps saying OutOfSync and Syncing (see attached image). However, if I stop the sync (click on Syncing, terminate) and then Sync the application again, then it successfully deploys all the defined resources.

Although I am using Custom Resources here (Certificate from cert-manager and KafkaConnect from Strimzi), the related custom health seem to exist already (https://github.com/argoproj/argo-cd/tree/master/resource_customizations).

The main problem is that I have several applications of this kind, so I would like to be able to automate this (instead of relying on manually stopping the sync and restarting it for all these applications). Any idea?

To Reproduce

Deploy an ArgoCD Application that contains the resources mentioned above. The Sync phase will start by itself and get stuck.

Expected behavior

The Sync should not get stuck and continue, without needing any manual action (terminate Sync and start it again)

Screenshots

argocd

Version

argocd: v1.7.6+b04c25e
  BuildDate: 2020-09-19T00:50:44Z
  GitCommit: b04c25eca8f1660359e325acd4be5338719e59a0
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.7.6+b04c25e
  BuildDate: 2020-09-19T00:52:04Z
  GitCommit: b04c25eca8f1660359e325acd4be5338719e59a0
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.17.8

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 15
  • Comments: 22 (5 by maintainers)

Most upvoted comments

Hi, Im having a very similar issue, is there at least a way to automatically terminate a long running sync? I could not find a setting like this.

Thinking about this a bit more - It may be that the application syncing issue is just a symptom of a wider issue? The health check does eventually get to a Healthy state and is in a “Degraded” state when the certificate issuance is pending.

I can make a PR for the small change above, which reduces issues when using cert-manager certificates, but that won’t fix the underlying sync issue. Where after getting into a degraded state, the application will wait for all resources to report “healthy” and seemingly deadlocks or waits for an event that never arrives.

I have been playing with a similar issue on version v1.7.8+ef5010c, in an app of apps scenario.

When the syncing hit a “Degraded” state, as part of the certificate issuing, it seems that the application syncing started waiting on everything all over again and would never get the healthy notifications.

I think it is an issue with the certificate health check, I overrode the default check by changing the argocd-cm.yaml file with the below (only replacing Degraded with Progressing):

data:
  resource.customizations: |
    cert-manager.io/Certificate:
      health.lua: |
        hs = {}
        if obj.status ~= nil then
          if obj.status.conditions ~= nil then
            for i, condition in ipairs(obj.status.conditions) do
              if condition.type == "Ready" and condition.status == "False" then
                hs.status = "Progressing"
                hs.message = condition.message
                return hs
              end
              if condition.type == "Ready" and condition.status == "True" then
                hs.status = "Healthy"
                hs.message = condition.message
                return hs
              end
            end
          end
        end

        hs.status = "Progressing"
        hs.message = "Waiting for certificate"
        return hs

Seems to have resolved the issue for me (at least in the very small number of tests I have done since).

One interesting symptom: When ArgoCD is stuck with “waiting for completion of hook …” no operation works with that Application resource until the sync operation is manually terminated.

For example:

  • Application gets stuck with “waiting for completion of hook …”
  • Application receives DELETE via the ArgoCD HTTP API.
  • Nothing changes, Application can stay in this state for days until the Sync is manually terminated.
  • Once the sync is manually terminated, the deletion (issued days ago) immediately proceeds and Application is deleted.

@rbreeze, I was running v1.7.6, the default appVersion of the helm chart at the time. Also encountered issues where deletion gets stuck.

However, I have since updated to v1.8.3 and I am no longer facing any issues! Everything working as expected.

i have the same issue as @lifelofranco mentioned Argo CD v1.8.3+0f9c684 Build Date 2021-01-21T22:20:39Z