argo-cd: argocd login just hangs on 2.4.0

Checklist:

  • I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

Describe the bug

Running argocd login <server> --sso does nothing and just hangs. The exact same command works on 2.3.

To Reproduce

  • Download the 2.4.0 binary (https://github.com/argoproj/argo-cd/releases/tag/v2.4.0).
  • Run argocd login <server-name> --sso.
  • It doesn’t seem like it’s related to my specific Argo instance, because it even hangs if I run something like argocd login random.com --sso.
  • Adding --loglevel debug doesn’t show anything either.

Expected behavior

A browser tab is opened to go through the SSO login flow.

Screenshots

N/A

Version

argocd: v2.4.0+91aefab
  BuildDate: 2022-06-10T17:38:43Z
  GitCommit: 91aefabc5b213a258ddcfe04b8e69bb4a2dd2566
  GitTreeState: clean
  GoVersion: go1.18.3
  Compiler: gc
  Platform: darwin/amd64

Logs

N/A

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 9
  • Comments: 18 (4 by maintainers)

Commits related to this issue

Most upvoted comments

@sklarsa Does it work if you use --skip-test-tls --grpc-web ?

I believe the problematic section of code is the TestTLS section in login.go but for a surprising reason.

To reproduce this, I believe ArgoCD must be setup behind a load balancer or reverse proxy that does have TLS support but does not have full HTTP/2 support (e.g., a classic AWS ELB). This would ordinarily require the explicit (or implicit) use of the --grpc-web flag to proxy gRPC over HTTP/1.

However, the TestTLS section in login.go expects to probe gRPC over HTTP/2 directly. It does not change its behavior based on the presence of --grpc-web.

So given that, the first probe will successfully complete a TLS handshake, but attempting to speak HTTP/2 over over the tunnel will fail if the load balancer or reverse proxy is expecting HTTP/1. It may, in fact, appear to “hang” waiting for an HTTP/1 request that will never come.

Eventually that fails or otherwise times out.

We then fall back to probing for an “insecure” (plaintext/non-TLS) connection. For reasons that are entirely baffling to me, I have observed that if you send the HTTP/2 connection preface (“magic”) and a SETTINGS frame in plaintext to a classic AWS ELB on a port configured to speak HTTPS, the load balancer will respond with its own SETTINGS frame in plaintext before closing the connection.

Here’s a small reproduction in Ruby:

#!/usr/bin/env ruby
require "socket"

s = TCPSocket.new("argocd.example", 443)

# HTTP/2 "Magic" + SETTINGS frame
s.write("PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n" + "\x00\x00\x00\x04\x00\x00\x00\x00\x00")
s.flush

print s.read

In my setup with ArgoCD behind a classic AWS ELB, this is the response:

% ruby argocd.rb | xxd
00000000: 0000 0004 0000 0000 00                   .........

I also spun up an entirely new classic AWS ELB, and it performed this way too.

Those bytes are a valid HTTP/2 SETTINGS frame in plaintext 😕, despite being sent to a port otherwise configured to terminate HTTPS and despite the classic ELB not fully supporting HTTP/2.

Even though the connection will otherwise be immediately closed due to the lack of full HTTP/2 support, the fact that a valid frame was received is enough to trigger the onPrefaceReceipt hook, which bubbles up here, then here, then here, then here, and eventually results in a connection marked as Ready (albeit briefly).

Because this plaintext probe succeeds, the user is prompted to accept a plaintext connection and the --plaintext flag is forced on if they confirm. But in reality, the server is not insecure, nor will it actually accept real traffic over plaintext.

So even if the user confirms --plaintext, we’ll later try to setup a plaintext gRPC connection to a TLS-protected port and that will fail or hang.

To confirm, I deleted the TestTLS probe, recompiled argocd, and was able to successfully use argocd login <server> --sso --grpc-web over TLS and HTTP/1.

I am not an expert on this codebase, but a couple of potential solutions might be:

  1. Add a flag to allow the user to bypass the TestTLS probe, if they are sure the flags they’ve provided are correct as-is.
  2. Remove the probe entirely, and trust that the user has provided the appropriate flags required to connect for their situation.
  3. Support --grpc-web when probing for TLS support in TestTLS.

I could probably contribute (1) or (2) if it’s acceptable, but (3) looks like a pretty big lift.

For operators of ArgoCD, I believe this could also be resolved by using a load balancer or reverse proxy that supports or passes through HTTP/2. Although I haven’t tested it yet, it might be possible that the AWS ELB SSL listener type (not HTTPS) would pass through HTTP/2 while still providing TLS termination (but also losing support for X-Forwarded-For and any other layer 7 features).

Hopefully this was a useful analysis. Let me know if I can clarify anything, and let me know what you think is the best solution.

/cc @crenshaw-dev @muma378

We have the same issue running argocd on AWS EKS. This prevents us from updating the argocd CLI since quite some time.

I’ve implemented option 3, adding an optional --skip-test-tls flag that skips the problematic check, see #10484

Would be nice if we can get this merged, I assume this is affecting multiple users. Otherwise, please let us know how we can continue.

Thanks a lot for your help!

same here

argocd: v2.4.3+471685f.dirty
  BuildDate: 2022-06-28T01:57:47Z
  GitCommit: 471685feae063c1c2e36a5ff268c4da87c697b85
  GitTreeState: dirty
  GoVersion: go1.18.3
  Compiler: gc
  Platform: darwin/arm64

I was able to spend some time looking into this yesterday, and I believe I have a hypothesis. But it will take me some time to verify and write up. I’ll try to get to it later today or tomorrow.

@muma378 unfortunately I’m not sure how to debug this one. Do you have any ideas?

I have no good ideas, except for using debugging tools like vscode and delve to run step by step to find out which line blocks the procedure.

Sorry for my mistake. I should have checked the case with correct configuration and address. 😢