argo-cd: Sync-waves not working as expected

Checklist:

  • I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

ArgoCD 2.4.20

Describe the bug

When doing a sync delete, the objects are not deleted according to the sync-waves

To Reproduce

I have an ArgoCD app with different Resources. The one with the lowest syn-wave is a Namespace, which is -10. All the other objects have a sync-wave 1. When I delete all the objects and sync (and prune) all the objects are deleted at the same time. In the following video, it can be seen, how the Namespace is marked as terminating immediately after the sync is invoked.

https://user-images.githubusercontent.com/3435696/217853336-e6d2f027-17db-4764-949a-3625cee6a00a.mp4

Other objects, in this case some Secrets and BareMetalHost, are taking longer to be deleted. But there is no wait between waves.

I have been looking this feature: https://github.com/argoproj/argo-cd/pull/3959 that should implement this waits, according to this code: https://github.com/argoproj/argo-cd/pull/3959/files/c2e5ccc81e0aaca81a2d002436d39d52a4364d2d#diff-f952d05ea83b61f771541425e28fa3931af2f2e7950261dd59cab93bdcfe2e9e Actually, there is one comment about that. There is a wait of objects deletion before going to other wave.

Expected behavior

The Namespace object is deleted, only, after the other objects have been deleted.

Screenshots

Version

image

ArgoCD 2.4.20

Logs

Paste any relevant application logs here.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 5
  • Comments: 20 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I think I have a simple reproducer for this:

It’s not as easy as I initially thought of. According to @jannfis new comment, the code for Sync Prune is different from what we were originally looking at. It lives in the gitops-engine repository.

At: https://github.com/argoproj/gitops-engine/blob/ed70eac8b7bd6b2f276502398fdbccccab5d189a/pkg/sync/sync_context.go#L387 

The core goal of this card is to have a test, that allows us to verify when this bug is fixed. So the best approach is not to write a unit-test, at least not for now. The best approach is to have a reproducer, so right now it fails (because of the wrong behavior) and when we fix the code, it should work fine.

Here’s the reproducer I came up with:

  1. Setup ArgoCD for development purposes (see my tutorial).
  2. Use my GitHub Repository, called syncwavetest, inside you will find a manifest.yaml file. Start ArgoCD and deploy my manifest with the following command:

./dist/argocd app create syncwaves-example --repo="https://github.com/drpaneas/syncwavetest" --path="./" --dest-server="https://kubernetes.default.svc" --sync-policy="auto" --project="default" --dest-namespace="default"

This will create a namespace, called syncwavetest and three pods podtest{1,2,3} inside of it. If you look at the manifest, I have attached the following wave annotations:

namespace syncwavetest -> argocd.argoproj.io/sync-wave: "10"
pod podtest1 -> argocd.argoproj.io/sync-wave: "20"
pod podtest2 -> argocd.argoproj.io/sync-wave: "30"
pod podtest3 -> argocd.argoproj.io/sync-wave: "40

Notice that ArgoCD correctly sets them up in the appropriate order (you can see this with kubectl get events -A --sort-by='.metadata.creationTimestamp' | grep 'synvwavetest\|podtest')

  1. Now go to my github and remove this manifest (make it empty, without contents). Force push with HEAD~1. Then go back to ArgoCD and click refresh. You will notice that it will say “OutOfSync”. This is correct, because git has changed (has nothing), but the cluster has stuff deployed.

  2. Click “Sync” and select “Prune” as well from the options. And see what happens:

Actual Behavior:

ArgoCD tried to delete everything at once (the namespace and the pods). See my screenshot:

sync_prune (1)

You can also verify this behavior via terminal:

drpaneas@linux:~/syncwaves$ kubectl -n syncwavetest get pods
NAME       READY   STATUS        RESTARTS   AGE
podtest1   0/1     Terminating   0          23h
podtest2   0/1     Terminating   0          23h
# podtest3 has been deleted already

As you can see, ArgoCD tried to delete everything, but it couldn’t delete podtest1 and podtest2 and the namespace. It couldn’t do it because in my original manifest.yaml file (the one you deployed) I included finalizers, that block this pods from getting deleted. This was done on purpose so we can witness the Terminating and 0/1 status, indicating that ArgoCD didn’t respect the syncwave order during pruning.

Expected behavior:

Since according to syncwaves attached to the pods, the podtest2 needs to be deleted before podtest1, then I would expect ArgoCD to not even try to delete podtest1, given that podtest2 is still there. Yes, it tried to delete it, but it didn’t check if it’s actually deleted/recycled. I’ve blocked the deletion of podtest2 and podtest1 on purpose so we can see this. In other words, this is the output I would expect:

drpaneas@linux:~/syncwaves$ kubectl -n syncwavetest get pods
NAME       READY   STATUS        RESTARTS   AGE
podtest1   1/1     Ready         0          23h
podtest2   0/1     Terminating   0          23h

So podtest3 is correctly deleted first in that case, and podtest2 is trying to get deleted, so Terminating looks fine for podtest2. The only problem is podtest1, that should say Ready and not Terminating, that is because ArgoCD shouldn’t try to delete it before podtest2 is actually recycled.

Hello everyone! 👋 I’ve created a patch (https://github.com/argoproj/gitops-engine/pull/538) to address this issue. I’d love for some folks to give it a try and share your thoughts. If you’re up for it, let me know!

I have created an enhancement proposal (#15074) to address this. Please let me know what you think.

it looks like the never worked as supposed to.

Any news on this issue? I also hit the same issue that syncwave deletion order is not working as expected, It tried positive and negative syncwaves but no luck