argo-cd: waiting for completion of hook and hook never succeds
HI,
We are seeing this issue quite often where app sync is getting stuck in “waiting for completion of hook” and these hooks are never getting completed
As you can see the below application got stuck on secret creation phase and some how that secret never got created
Stripped out all un-necessary details. Now this is how the secret is created and used by the job.
apiVersion: v1
kind: Secret
metadata:
name: {{ include "xxx.fullname" . }}-migrations-{{ .Chart.AppVersion }}
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
helm.sh/hook-weight: "-5"
type: Opaque
data:
xxxx
apiVersion: batch/v1
kind: Job
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation
helm.sh/hook-weight: "-4"
spec:
annotations:
sidecar.istio.io/inject: "false"
spec:
- name: app-settings
configMap:
name: {{ include "xxx.fullname" . }}-migrations-{{ .Chart.AppVersion }}
- name: app-secrets
secret:
secretName: {{ include "xxx.fullname" . }}-migrations-{{ .Chart.AppVersion }}
kubectl -n argocd logs argocd-server-768f46f469-j98h6 | grep xxx-migrations - No matching logs kubectl -n argocd logs argocd-repo-server-57bdbf899c-9lxhr | grep xxx-migrations - No matching logs kubectl -n argocd logs argocd-repo-server-57bdbf899c-7xvs7 | grep xxx-migrations - No matching logs kubectl -n argocd logs argocd-server-768f46f469-tqp8p | grep xxx-migrations - No matching logs
[testadmin@server0 ~]$ kubectl -n argocd logs argocd-application-controller-0 | grep orchestrator-migrations time=“2021-08-02T02:16:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:16:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:19:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:19:26Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:22:17Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:22:17Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:22:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:25:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:25:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:28:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:28:26Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:31:25Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx time=“2021-08-02T02:31:26Z” level=info msg=“Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494” application=xxx
Environment:
- 3 Node RKE2 Cluster
- OS: RHEL 8.4
- K8’s setup on Azure VM’s
ArgoCD Version: 2.0.1
Please let me know in case of any other info required
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 24
- Comments: 42 (4 by maintainers)
I have the same problem in version
2.2.0
.Hello Argo community 😃
I am fairly familiar with ArgoCD codebase and API, and I’d happily try to repay you for building such an awesome project by trying to have a stab at this issue, if there are no objections?
I just figured out what was causing Argo to freeze on the hook. In my case the specific hook had
ttlSecondsAfterFinished: 0
defined in the spec. Through Kustomize I removed this field:Afterwards the chart finally went through! It’s still a bug that should be addressed, I’m just sharing this for others to work around it.
I’m seeing this issue with
v2.6.1+3f143c9
We started experiencing this issue after upgrading to
2.3.3
. Before that we were on2.2.3
. I am not 100% sure but I do not recall we had any issue with2.2.3
.I can confirm the error was fixed on 2.0.3. We recently upgraded to 2.3.3 and we are experiencing the error again.
@boedy You’re a Saint. I’ve been staring at envoyproxy/gateway for two weeks.
We had to completely exclude all jobs from argo cd via resource exclusion global config: https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#resource-exclusioninclusion
And we migrated all jobs on repo to cronjobs with
suspend: true
BUT fair warninig, Due to a k8s bug sometimes cronjobs may be triggered when changing spec, INCLUDING changingsuspend: false
tosuspend: true
- yes it’s stupid like that…I think it’s this one, but there are others as well… https://github.com/kubernetes/kubernetes/issues/63371
Facing the same on
v2.7.9
while (re)deploying https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stackSame issue in v1.3.3.7 and also in version v6.9… This issue was opened on Aug 2, 2021, we are now at 2023, please bump this comment via emoji so I can see it in my inbox in 2042
in all seriousness, still happens at version 2.7.7
We also had this issue and it was resolved once we set
ARGOCD_CONTROLLER_REPLICAS
.Instructions here: https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/#argocd-application-controller
We’re seeing a similar issue on the syncfailed hook which means we can’t actually terminate the sync action.
The job doesn’t exist in the target namespace, and we’ve tried to trick argo by creating a job with the same name, namespace, and annotations as we’d expect to see with a simple echo "done’ action but nothing is helping.
ArgoCD Version;
{“Version”:“v2.3.4+ac8b7df”,“BuildDate”:“2022-05-18T11:41:37Z”,“GitCommit”:“ac8b7df9467ffcc0920b826c62c4b603a7bfed24”,“GitTreeState”:“clean”,“GoVersion”:“go1.17.10”,“Compiler”:“gc”,“Platform”:“linux/amd64”,“KsonnetVersion”:“v0.13.1”,“KustomizeVersion”:“v4.4.1 2021-11-11T23:36:27Z”,“HelmVersion”:“v3.8.0+gd141386”,“KubectlVersion”:“v0.23.1”,“JsonnetVersion”:“v0.18.0”}
@alexmt - We are using the below version of ArgoCD and seeing the same issue with Contour helm. Application is waiting for PreSync Job to complete whereas on a cluster I can see the job is completed.
{ “Version”: “v2.1.3+d855831”, “BuildDate”: “2021-09-29T21:51:21Z”, “GitCommit”: “d855831540e51d8a90b1006d2eb9f49ab1b088af”, “GitTreeState”: “clean”, “GoVersion”: “go1.16.5”, “Compiler”: “gc”, “Platform”: “linux/amd64”, “KsonnetVersion”: “v0.13.1”, “KustomizeVersion”: “v4.2.0 2021-06-30T22:49:26Z”, “HelmVersion”: “v3.6.0+g7f2df64”, “KubectlVersion”: “v0.21.0”, “JsonnetVersion”: “v0.17.0” }
I suspect this is fixed by https://github.com/argoproj/argo-cd/pull/6294 . The fix is available in https://github.com/argoproj/argo-cd/releases/tag/v2.0.3 . Can you try upgrading please?