argo-cd: ArgoCD authentication handshake failed
Describe the bug
Hello ArgoCD Team!
We have upgraded our ArgoCD instances recently and facing bothering issue now. Time to time ArgoCD starts sync process and looks like it hanging somewhere. Then it works normally and then hit again.
We also noticed that restart of argocd-repo-server
can help for a short moment.
We have one replica of argocd-repo-server
and two replicas of argocd-server
.
To Reproduce
N/A.
Expected behavior
ArgoCD is not hanging during app sync
Screenshots
Version
argocd: v1.8.1+c2547dc
BuildDate: 2020-12-10T02:57:57Z
GitCommit: c2547dca95437fdbb4d1e984b0592e6b9110d37f
GitTreeState: clean
GoVersion: go1.14.12
Compiler: gc
Platform: linux/amd64
Logs
time="2020-12-22T12:43:44Z" level=info msg="Sync operation to failed: ComparisonError: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: remote error: tls: internal error\"" application=graph-db-graphdb-api-instances dest-namespace=graph-db dest-server="https://kubernetes.default.svc" reason=OperationCompleted type=Warning
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 26
- Comments: 38 (15 by maintainers)
Commits related to this issue
- fix: add liveness probe to restart repo server if it fails to server tls requests (#5110) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> — committed to alexmt/argo-cd by alexmt 4 years ago
- fix: add liveness probe to restart repo server if it fails to server tls requests (#5110) (#5119) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> — committed to argoproj/argo-cd by deleted user 3 years ago
- fix: add liveness probe to restart repo server if it fails to server tls requests (#5110) (#5119) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> — committed to argoproj/argo-cd by deleted user 3 years ago
- fix: add liveness probe to restart repo server if it fails to server tls requests (#5110) (#5119) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Signed-off-by: Remington Breeze <re... — committed to argoproj/argo-cd by deleted user 3 years ago
- fix: add liveness probe to restart repo server if it fails to server tls requests (#5110) (#5119) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> — committed to shubhamagarwal19/argo-cd by deleted user 3 years ago
Hi,
We’re currently facing the issue, here the result of the command (run internally of a repo-server):
Hope I will help.
Edit: I solved the issue by deleting the repo-server, that was okay for that cluster, only few apps. Our main cluster has more than 2500 applications, with 10 repo-servers, I can’t do that as mitigation.
Hi Team,
We’re seeing a similar error with V2.0.0. We setup Argo CD as few weeks ago and everything seemed fine for a while. Nothing is in production yet we’re just trying things out. Our cluster in AWS eks.
Argo CD v2.0.0+d085636 Build Date 2021-03-30T19:10:08Z Go Version go1.16 Go Compiler gc Platform linux/amd64 ksonnet v0.13.1 jsonnet v0.17.0 kustomize v3.9.4 2021-02-09T19:22:10Z Helm v3.5.1+g32c2223 kubectl v0.20.4
We’re seeing sync error:
@rumstead , just published https://github.com/argoproj/argo-cd/releases/tag/v1.8.2
It includes liveness probe that auto-restart repo-server
We’ve been using liveness probe for a long time - it is pretty good workaround and buys us some time. Next will try to upgrade grpc and golang versions.
Yes we are facing this internally as well. We suspect the upgrade in gRPC libraries may have caused this but haven’t confirmed. In an internal build, we put in place a gRPC health check which kills the repo-server when this happened and that allowed us to recover, but doesn’t address the root cause.
@alexmt Thanks for the patch! Would you kindly update the comment to have image to
alpine:3.8
rather thandocker.intuit.com/oicp/alpine3.8:latest
so other users don’t hit a 403 Unauthorized. Thanks! 😊We saw this from 1.7 -> 1.8 as well
We did upgrade from 1.7 to 1.8 and I confirm that 1.7 has no this issue.
OK, this one is interesting
Apparently, for
openssl
client, this error is transient while for our golang/gRPC client, it is not.Will investigate further.
Thank you for this additional information, @Issif