argo-cd: Applications from ApplicationSet flip rapidly between "Unknown" and "Synchronised"

Checklist:

  • I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

Describe the bug

Since upgrading from ArgoCD 2.8.4 to 2.9.0 our applications that have been generated via ApplicationSets are constantly flapping multiple times a second between “Synchronised” and “Unknown” in the UI. From what I can tell from diffing the generated application as per https://argo-cd.readthedocs.io/en/stable/operator-manual/reconcile/#finding-resources-to-ignore, the sync status and repoURL under status -> sync is constantly flapping between “” and the desired value. I’ve included the logs further down.

The outcome of this is that argoCD essentially hammers itself with this constant and rapid flipping between states. I’ve included logs from the application controller which illustrates this behaviour.

We used argoCD autopilot to generate our applicationsets last year and I have found removing ignoreDifferences from the applicationset template spec stops the flapping. I’m not sure if this is expected behaviour, as creating an application directly with ignoreDifferences configured doesn’t seem to do this.

To Reproduce

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "0"
  creationTimestamp: null
  name: cluster-resources
  namespace: argocd
spec:
  generators:
    - git:
        files:
          - path: kubernetes/bootstrap/cluster-resources/*.json
        repoURL: github.com/***
        requeueAfterSeconds: 20
        revision: main
        template:
          metadata: {}
          spec:
            destination: {}
            project: ""
            source:
              repoURL: ""
  syncPolicy:
    preserveResourcesOnDeletion: true
  template:
    metadata:
      labels:
        app.kubernetes.io/managed-by: argocd-autopilot
        app.kubernetes.io/name: cluster-resources-{{name}}
      name: cluster-resources-{{name}}
      namespace: argocd
    spec:
      destination:
        server: "{{server}}"
      # Removing this stops the flapping
      ignoreDifferences:
        - group: argoproj.io
          jsonPointers:
            - /status
          kind: Application
      project: default
      source:
        path: kubernetes/bootstrap/cluster-resources/{{name}}
        repoURL: https://github.com/***
        targetRevision: main
      syncPolicy:
        automated:
          allowEmpty: true
          selfHeal: true
status: {}

Where the cluster resources directory contains a file called “in-cluster.json”:

{"name":"in-cluster","server":"https://kubernetes.default.svc"}

and a folder called “in-cluster” that contains a namespace definition (argocd-ns.yaml):

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false
  creationTimestamp: null
  name: argocd

This isn’t the only applicationset where this is happening, but was the most straight forward reproduction case for us.

Expected behavior

We expect the applications to remain in Synchronised status

Screenshots

Version

argocd version
argocd: v2.9.0+9cf0c69
  BuildDate: 2023-11-06T04:43:50Z
  GitCommit: 9cf0c69bbe70393db40e5755e34715f30179ee09
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64

Logs

Application controller. This shows that in a second it’s repeatedly going Updated sync status: -> Synced:

time="2023-11-07T10:42:04Z" level=info msg="Updated sync status:  -> Synced" application=cluster-resources-in-cluster dest-namespace= dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2023-11-07T10:42:04Z" level=info msg="Update successful" application=argocd/cluster-resources-in-cluster
time="2023-11-07T10:42:04Z" level=debug msg="Requesting app refresh caused by object update" api-version=argoproj.io/v1alpha1 application=argocd/autopilot-bootstrap cluster-name= fields.level=0 kind=Application name=cluster-resources-in-cluster namespace=argocd server="https://kubernetes.default.svc"
time="2023-11-07T10:42:04Z" level=info msg="Reconciliation completed" application=argocd/cluster-resources-in-cluster dedup_ms=0 dest-name= dest-namespace= dest-server="https://kubernetes.default.svc" diff_ms=2 fields.level=3 git_ms=289 health_ms=0 live_ms=0 patch_ms=11 setop_ms=0 settings_ms=0 sync_ms=0 time_ms=316
time="2023-11-07T10:42:04Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=argocd/autopilot-bootstrap
time="2023-11-07T10:42:04Z" level=info msg="No status changes. Skipping patch" application=argocd/autopilot-bootstrap
time="2023-11-07T10:42:04Z" level=info msg="Reconciliation completed" application=argocd/autopilot-bootstrap dest-name= dest-namespace=argocd dest-server="https://kubernetes.default.svc" fields.level=0 patch_ms=0 setop_ms=0 time_ms=6
time="2023-11-07T10:42:04Z" level=debug msg="Requesting app refresh caused by object update" api-version=argoproj.io/v1alpha1 application=argocd/autopilot-bootstrap cluster-name= fields.level=0 kind=Application name=cluster-resources-in-cluster namespace=argocd server="https://kubernetes.default.svc"
time="2023-11-07T10:42:04Z" level=info msg="Refreshing app status (spec.source differs), level (3)" application=argocd/cluster-resources-in-cluster
time="2023-11-07T10:42:04Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=argocd/autopilot-bootstrap
time="2023-11-07T10:42:04Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: )" application=argocd/cluster-resources-in-cluster
time="2023-11-07T10:42:04Z" level=debug msg="Generating Manifest for source {https://github.com/*** kubernetes/bootstrap/cluster-resources/in-cluster 2.9-speculative-fix nil nil nil nil  } revision 2.9-speculative-fix"
time="2023-11-07T10:42:04Z" level=info msg="No status changes. Skipping patch" application=argocd/autopilot-bootstrap
time="2023-11-07T10:42:04Z" level=info msg="Reconciliation completed" application=argocd/autopilot-bootstrap dest-name= dest-namespace=argocd dest-server="https://kubernetes.default.svc" fields.level=0 patch_ms=0 setop_ms=0 time_ms=7
time="2023-11-07T10:42:05Z" level=info msg="getRepoObjs stats" application=argocd/cluster-resources-in-cluster build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=298 unmarshal_ms=297 version_ms=0
time="2023-11-07T10:42:05Z" level=debug msg="Retrieved live manifests" application=argocd/cluster-resources-in-cluster
time="2023-11-07T10:42:05Z" level=info msg="Skipping auto-sync: application status is Synced" application=argocd/cluster-resources-in-cluster
time="2023-11-07T10:42:05Z" level=info msg="Updated sync status:  -> Synced" application=cluster-resources-in-cluster dest-namespace= dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

Diff of the application CR. It seems to be rapidly switching between:

  sync:
    comparedTo:
      destination:
        server: https://kubernetes.default.svc
      ignoreDifferences:
      - group: argoproj.io
        jsonPointers:
        - /status
        kind: Application
      source:
        path: kubernetes/bootstrap/cluster-resources/in-cluster
        repoURL: ""
        targetRevision: 2.9-speculative-fix
    revision: cda88b740ba847be6bb94172834e4b6971099956
    status: ""



  sync:
    comparedTo:
      destination:
        server: https://kubernetes.default.svc
      ignoreDifferences:
      - group: argoproj.io
        jsonPointers:
        - /status
        kind: Application
      source:
        path: kubernetes/bootstrap/cluster-resources/in-cluster
        repoURL: https://github.com/***
        targetRevision: 2.9-speculative-fix
    revision: cda88b740ba847be6bb94172834e4b6971099956
    status: Synced

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 11
  • Comments: 19 (7 by maintainers)

Most upvoted comments

I think we’re seeing this at Intuit, too. Looking into it…

v2.9.1 seems to have fixed this issue for us! Thanks a lot @crenshaw-dev 👍 I also see that the Repo Server is also back to normal as I did see some anomalies there that I initially failed to mention. image

I can’t reproduce the issue on release-2.9 now that https://github.com/argoproj/argo-cd/pull/16299 is merged. In the interest of time, I’ll skip the deep-dive into why this bug exists and instead cut 2.9.1. If other weirdness appears, we’ll tackle that as it comes. 😃 Thanks everyone for your patience and help! I’ll post here again when 2.9.1 is out.

We’ve rolled this out too and I agree that it seems to be fixed in 2.9.1. Thanks a lot for picking this up so quickly!

This reproduces the issue in 2.9.0:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: bug-16260
  namespace: argocd
spec:
  generators:
   - git:
       directories:
         - path: helm-guestbook
       repoURL: https://github.com/argoproj/argocd-example-apps.git
       requeueAfterSeconds: 10
       revision: master
  template:
    metadata:
      name: bug-16260
    spec:
      project: default
      source:
        repoURL: https://github.com/argoproj/argocd-example-apps.git
        path: helm-guestbook
      destination:
        server: https://kubernetes.default.svc
        namespace: default
      ignoreDifferences:
      - group: argoproj.io
        jsonPointers:
        - /status
        kind: Application

@crenshaw-dev do we have an ETA on 2.9.1?

Just saw it got released 15 minutes ago, thanks a lot!

Why are you ignoring the status of the applications?

      ignoreDifferences:
      - group: argoproj.io
        jsonPointers:
        - /status

With the status ignored, the application’s .status.sync.status flicks from Synced to “”. The UI displays the change until the status is restored.

It looks like the applicationset controller is blowing away the application status because it’s ignored then the application controller restores it. https://github.com/argoproj/argo-cd/pull/14743 is when the applicationset code was introduced but it starts in 2.9.0.

Interestingly enough, there was some refactoring to that code in https://github.com/argoproj/argo-cd/pull/15965 which was put in v2.9.1. I feel the issue will still be there from my glance but maybe worth a test.