argo-cd: Sync-waves not working as expected
Checklist:
- I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I’ve included steps to reproduce the bug.
- I’ve pasted the output of
argocd version
.
ArgoCD 2.4.20
Describe the bug
When doing a sync delete, the objects are not deleted according to the sync-waves
To Reproduce
I have an ArgoCD app with different Resources. The one with the lowest syn-wave is a Namespace, which is -10. All the other objects have a sync-wave 1. When I delete all the objects and sync (and prune) all the objects are deleted at the same time. In the following video, it can be seen, how the Namespace is marked as terminating immediately after the sync is invoked.
https://user-images.githubusercontent.com/3435696/217853336-e6d2f027-17db-4764-949a-3625cee6a00a.mp4
Other objects, in this case some Secrets and BareMetalHost, are taking longer to be deleted. But there is no wait between waves.
I have been looking this feature: https://github.com/argoproj/argo-cd/pull/3959 that should implement this waits, according to this code: https://github.com/argoproj/argo-cd/pull/3959/files/c2e5ccc81e0aaca81a2d002436d39d52a4364d2d#diff-f952d05ea83b61f771541425e28fa3931af2f2e7950261dd59cab93bdcfe2e9e Actually, there is one comment about that. There is a wait of objects deletion before going to other wave.
Expected behavior
The Namespace object is deleted, only, after the other objects have been deleted.
Screenshots
Version
ArgoCD 2.4.20
Logs
Paste any relevant application logs here.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 5
- Comments: 20 (9 by maintainers)
Commits related to this issue
- Fix Argo Deletion Again According to https://github.com/argoproj/argo-cd/issues/12376 waves don't actually work properly for deletion. I can see in the code it does a reverse sort and picks off the ... — committed to eschercloudai/helm-cluster-api by spjmurray a year ago
- Fix Argo Deletion Again (#40) According to https://github.com/argoproj/argo-cd/issues/12376 waves don't actually work properly for deletion. I can see in the code it does a reverse sort and picks ... — committed to eschercloudai/helm-cluster-api by spjmurray a year ago
I think I have a simple reproducer for this:
It’s not as easy as I initially thought of. According to @jannfis new comment, the code for Sync Prune is different from what we were originally looking at. It lives in the gitops-engine repository.
At: https://github.com/argoproj/gitops-engine/blob/ed70eac8b7bd6b2f276502398fdbccccab5d189a/pkg/sync/sync_context.go#L387
The core goal of this card is to have a test, that allows us to verify when this bug is fixed. So the best approach is not to write a unit-test, at least not for now. The best approach is to have a reproducer, so right now it fails (because of the wrong behavior) and when we fix the code, it should work fine.
Here’s the reproducer I came up with:
./dist/argocd app create syncwaves-example --repo="https://github.com/drpaneas/syncwavetest" --path="./" --dest-server="https://kubernetes.default.svc" --sync-policy="auto" --project="default" --dest-namespace="default"
This will create a namespace, called
syncwavetest
and three podspodtest{1,2,3}
inside of it. If you look at the manifest, I have attached the following wave annotations:Notice that ArgoCD correctly sets them up in the appropriate order (you can see this with
kubectl get events -A --sort-by='.metadata.creationTimestamp' | grep 'synvwavetest\|podtest')
Now go to my github and remove this manifest (make it empty, without contents). Force push with HEAD~1. Then go back to ArgoCD and click refresh. You will notice that it will say “OutOfSync”. This is correct, because git has changed (has nothing), but the cluster has stuff deployed.
Click “Sync” and select “Prune” as well from the options. And see what happens:
Actual Behavior:
ArgoCD tried to delete everything at once (the namespace and the pods). See my screenshot:
You can also verify this behavior via terminal:
As you can see, ArgoCD tried to delete everything, but it couldn’t delete podtest1 and podtest2 and the namespace. It couldn’t do it because in my original
manifest.yaml
file (the one you deployed) I included finalizers, that block this pods from getting deleted. This was done on purpose so we can witness theTerminating
and0/1
status, indicating that ArgoCD didn’t respect the syncwave order during pruning.Expected behavior:
Since according to syncwaves attached to the pods, the
podtest2
needs to be deleted beforepodtest1
, then I would expect ArgoCD to not even try to deletepodtest1
, given thatpodtest2
is still there. Yes, it tried to delete it, but it didn’t check if it’s actually deleted/recycled. I’ve blocked the deletion ofpodtest2
andpodtest1
on purpose so we can see this. In other words, this is the output I would expect:So
podtest3
is correctly deleted first in that case, andpodtest2
is trying to get deleted, soTerminating
looks fine for podtest2. The only problem ispodtest1
, that should sayReady
and notTerminating
, that is because ArgoCD shouldn’t try to delete it beforepodtest2
is actually recycled.Hello everyone! 👋 I’ve created a patch (https://github.com/argoproj/gitops-engine/pull/538) to address this issue. I’d love for some folks to give it a try and share your thoughts. If you’re up for it, let me know!
I have created an enhancement proposal (#15074) to address this. Please let me know what you think.
it looks like the never worked as supposed to.
Any news on this issue? I also hit the same issue that syncwave deletion order is not working as expected, It tried positive and negative syncwaves but no luck