argo-cd: Unable to add 1.24.0 Kubernetes cluster
Checklist:
- I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I’ve included steps to reproduce the bug.
-  I’ve pasted the output of argocd version.
Describe the bug
When I tried to add a freshly created v1.24.0 Kubernetes cluster to argocd, I got timeout (see the Logs for details). The cluster can’t be added.
Then I created a fresh v1.23.6 cluster, I can add it successfully.
I’m using kubeadm to create my Kubernetes clusters. The only difference between the two creation is one single parameter passed to kubeadm init, which is --kubernetes-version.
Version
argocd: v2.3.3+07ac038
  BuildDate: 2022-03-30T01:46:59Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: clean
  GoVersion: go1.17.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.3.3+07ac03
Logs
INFO[0001] ServiceAccount "argocd-manager" already exists in namespace "kube-system" 
INFO[0001] ClusterRole "argocd-manager-role" updated    
INFO[0002] ClusterRoleBinding "argocd-manager-role-binding" updated 
FATA[0032] Failed to wait for service account secret: timed out waiting for the condition
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 23 (9 by maintainers)
Just want to share my (hacky) work around on this.
kube-systemnamespace, making sure that the annotation refers to theargocd-managerservice account;datainto the newly created secret;kubectl edit sa -n kube-system argocd-managerto manually add the secret to the service account:With that, to fix this by the ‘short term solution’, we may need to not only create a service account token Secret, but also add the secret to the
argocd-managerservice account.I do confirm . Issue appeared in OCP 4.11 which is based on kubernetes 1.24 I would say this is a bug in kubernetes. Because I can see this behavior is broken with prometheus in openshift . i.e.
oc sa get-token prometheus-k8s -n openshift-monitoringdid not work as well.So this means that parsing the token of an SA has been changed since k8s 1.24
resolution
I experienced this issue on Argo CD v2.7.2
The workaround was as described above in two separate posts.
For completeness here is my solution.
My context was for local testing multiple clusters
Steps to solve
Create a kind cluster with an
apiServerAddressthat is accessible for your Argo CD instance (not localhost). Most likely your IP “192.x.x.x:8443”kind docs ref
Run the argocd command to add a cluster
It will fail with a timeout. That’s when have to switch to the kind dev cluster context and create the additional secret for the service account and associate the
argocd-managerservice account with the new secret.In your dev-cluster context
Create service account secret
Add secret to service account
The fix was released in 2.3.7 and 2.4.0 onward.
It turns out the TokenRequest API is pretty straight forward to use. Here’s a hacky WIP commit to show what it looks like. I have tried both approaches (creating secret and using token request API), and the TokenRequest API seems to resolve the issue. Still need to work through the best approach for maintaining backwards compatibility with the Secret approach for older versions of k8s.
This may be related to changes for serviceaccount token in 1.24:
The error is coming from here in code: https://github.com/argoproj/argo-cd/blob/8cd7d470e89212b085c03462c042925a1f52d3f2/util/clusterauth/clusterauth.go#L244