argo-cd: ArgoCD authentication handshake failed

Describe the bug

Hello ArgoCD Team!

We have upgraded our ArgoCD instances recently and facing bothering issue now. Time to time ArgoCD starts sync process and looks like it hanging somewhere. Then it works normally and then hit again.

We also noticed that restart of argocd-repo-server can help for a short moment.

We have one replica of argocd-repo-server and two replicas of argocd-server.

To Reproduce

N/A.

Expected behavior

ArgoCD is not hanging during app sync

Screenshots

Screenshot 2020-12-22 at 16 03 04

Version

argocd: v1.8.1+c2547dc
  BuildDate: 2020-12-10T02:57:57Z
  GitCommit: c2547dca95437fdbb4d1e984b0592e6b9110d37f
  GitTreeState: clean
  GoVersion: go1.14.12
  Compiler: gc
  Platform: linux/amd64

Logs

time="2020-12-22T12:43:44Z" level=info msg="Sync operation to  failed: ComparisonError: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: remote error: tls: internal error\"" application=graph-db-graphdb-api-instances dest-namespace=graph-db dest-server="https://kubernetes.default.svc" reason=OperationCompleted type=Warning

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 26
  • Comments: 38 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Hi,

We’re currently facing the issue, here the result of the command (run internally of a repo-server):


argocd@argocd-repo-server-c4d8c7f6b-cwhjj:~$ openssl s_client -host localhost -port 8081

CONNECTED(00000003)
Can't use SSL_get_servername
depth=0 O = Argo CD
verify error:num=18:self signed certificate
verify return:1
depth=0 O = Argo CD
verify return:1
140146647491712:error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error:../ssl/record/rec_layer_s3.c:1544:SSL alert number 80
---
Certificate chain
 0 s:O = Argo CD
   i:O = Argo CD
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDETCCAfmgAwIBAgIRAKGoOTg9C6uM6wu2GonmFT4wDQYJKoZIhvcNAQELBQAw
EjEQMA4GA1UEChMHQXJnbyBDRDAeFw0yMDEyMjIxNjMyMjRaFw0yMTEyMjIxNjMy
MjRaMBIxEDAOBgNVBAoTB0FyZ28gQ0QwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAw
ggEKAoIBAQCsshiDMHQ61J55+0AOApeCY4xgmbQ+/+YqWKQDbRpBbHdDEgf4Bux8
Ij1XUBgC4iTlI6LNu1rXgt3wIr3peeiNsrjcHR73C0A5WVAPsbD+ueyAj9mdA1RM
Iq70WwvdJv7ITGSCfJ2di4gV3AeMVO2qBH5A5GOtIaVt4dbskvG4i7cvvr+Rvnrq
4xX1jMbjh1pkhCnXrNhnyxPrwHgYV0Lz4+eeirMsJD603OJgekYDHGT/v9AtnW/8
KN/6VGHy4qvjjlms7wXaeANK1wu9h0dfublFuC8jvr/BUdHdMqZ71vn/FluRVdqa
smAHQWO2DxdbR2CHXuTuLaKdZYTwOso3AgMBAAGjYjBgMA4GA1UdDwEB/wQEAwIC
pDATBgNVHSUEDDAKBggrBgEFBQcDATAPBgNVHRMBAf8EBTADAQH/MCgGA1UdEQQh
MB+CCWxvY2FsaG9zdIISYXJnb2NkLXJlcG8tc2VydmVyMA0GCSqGSIb3DQEBCwUA
A4IBAQCargXh/niqJcbKZGkhDp7SY72Fmy9wSjnfSALOJtiomHeAt2kuOmmcu8v6
B62xIHYHMIU/bVecV4CgdyoVOeNmA9Hs3UUuIMBWWuCPFnUJUIpijY34/xdYceXB
AHX8OGmjY/VdLQgRM5fQg+ufZiqNRRPnB9uxxzpqy1VxGKetoXdCzfATmIsNh32N
0otUE2PGEufM01ggJWD3sUoKewlBHPmyAocEzDDLFVsQdKfFFB4PsYPCDZvlH2mq
CoDvRcSQ2Y2D7U6DlzQOrDBhqVMmpC9GHp8wHtH+rMgo1ZJyAfZgIvYJTgdfn2HC
LqFFE09tUWyK8ZZSIhLtSE53QZsZ
-----END CERTIFICATE-----
subject=O = Argo CD

issuer=O = Argo CD

---
No client certificate CA names sent
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1009 bytes and written 283 bytes
Verification error: self signed certificate
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 18 (self signed certificate)
---

Hope I will help.

Edit: I solved the issue by deleting the repo-server, that was okay for that cluster, only few apps. Our main cluster has more than 2500 applications, with 10 repo-servers, I can’t do that as mitigation.

Hi Team,

We’re seeing a similar error with V2.0.0. We setup Argo CD as few weeks ago and everything seemed fine for a while. Nothing is in production yet we’re just trying things out. Our cluster in AWS eks.

Argo CD v2.0.0+d085636 Build Date 2021-03-30T19:10:08Z Go Version go1.16 Go Compiler gc Platform linux/amd64 ksonnet v0.13.1 jsonnet v0.17.0 kustomize v3.9.4 2021-02-09T19:22:10Z Helm v3.5.1+g32c2223 kubectl v0.20.4

We’re seeing sync error:

argo-cd-connection-error

@rumstead , just published https://github.com/argoproj/argo-cd/releases/tag/v1.8.2

It includes liveness probe that auto-restart repo-server

We’ve been using liveness probe for a long time - it is pretty good workaround and buys us some time. Next will try to upgrade grpc and golang versions.

Yes we are facing this internally as well. We suspect the upgrade in gRPC libraries may have caused this but haven’t confirmed. In an internal build, we put in place a gRPC health check which kills the repo-server when this happened and that allowed us to recover, but doesn’t address the root cause.

We’ve added livenessProbe that restart repo server as a workaround. Here is merge patch that introduce the probe:

apiVersion: apps/v1
[...]
      initContainers:
      - name: download-grpc-health-probe
        image: docker.intuit.com/oicp/alpine3.8:latest

@alexmt Thanks for the patch! Would you kindly update the comment to have image to alpine:3.8 rather than docker.intuit.com/oicp/alpine3.8:latest so other users don’t hit a 403 Unauthorized. Thanks! 😊

We saw this from 1.7 -> 1.8 as well

So just to gather more information, this issue started for people after upgrading to 1.8 right?

May I ask from what exact versions you were upgrading from, that was not affected by this issue? We update Go with v1.7.9 and it would be interesting to know if this and later 1.7 versions are affected as well.

We did upgrade from 1.7 to 1.8 and I confirm that 1.7 has no this issue.

OK, this one is interesting

140146647491712:error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error:../ssl/record/rec_layer_s3.c:1544:SSL alert number 80

Apparently, for openssl client, this error is transient while for our golang/gRPC client, it is not.

Will investigate further.

Thank you for this additional information, @Issif