argo-cd: Unable to add 1.24.0 Kubernetes cluster
Checklist:
- I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I’ve included steps to reproduce the bug.
- I’ve pasted the output of
argocd version
.
Describe the bug
When I tried to add a freshly created v1.24.0 Kubernetes cluster to argocd, I got timeout (see the Logs for details). The cluster can’t be added.
Then I created a fresh v1.23.6 cluster, I can add it successfully.
I’m using kubeadm to create my Kubernetes clusters. The only difference between the two creation is one single parameter passed to kubeadm init
, which is --kubernetes-version
.
Version
argocd: v2.3.3+07ac038
BuildDate: 2022-03-30T01:46:59Z
GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
GitTreeState: clean
GoVersion: go1.17.6
Compiler: gc
Platform: linux/amd64
argocd-server: v2.3.3+07ac03
Logs
INFO[0001] ServiceAccount "argocd-manager" already exists in namespace "kube-system"
INFO[0001] ClusterRole "argocd-manager-role" updated
INFO[0002] ClusterRoleBinding "argocd-manager-role-binding" updated
FATA[0032] Failed to wait for service account secret: timed out waiting for the condition
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 23 (9 by maintainers)
Just want to share my (hacky) work around on this.
kube-system
namespace, making sure that the annotation refers to theargocd-manager
service account;data
into the newly created secret;kubectl edit sa -n kube-system argocd-manager
to manually add the secret to the service account:With that, to fix this by the ‘short term solution’, we may need to not only create a service account token Secret, but also add the secret to the
argocd-manager
service account.I do confirm . Issue appeared in OCP 4.11 which is based on kubernetes 1.24 I would say this is a bug in kubernetes. Because I can see this behavior is broken with prometheus in openshift . i.e.
oc sa get-token prometheus-k8s -n openshift-monitoring
did not work as well.So this means that parsing the token of an SA has been changed since k8s 1.24
resolution
I experienced this issue on Argo CD v2.7.2
The workaround was as described above in two separate posts.
For completeness here is my solution.
My context was for local testing multiple clusters
Steps to solve
Create a kind cluster with an
apiServerAddress
that is accessible for your Argo CD instance (not localhost). Most likely your IP “192.x.x.x:8443”kind docs ref
Run the argocd command to add a cluster
It will fail with a timeout. That’s when have to switch to the kind dev cluster context and create the additional secret for the service account and associate the
argocd-manager
service account with the new secret.In your dev-cluster context
Create service account secret
Add secret to service account
The fix was released in 2.3.7 and 2.4.0 onward.
It turns out the TokenRequest API is pretty straight forward to use. Here’s a hacky WIP commit to show what it looks like. I have tried both approaches (creating secret and using token request API), and the TokenRequest API seems to resolve the issue. Still need to work through the best approach for maintaining backwards compatibility with the Secret approach for older versions of k8s.
This may be related to changes for serviceaccount token in 1.24:
The error is coming from here in code: https://github.com/argoproj/argo-cd/blob/8cd7d470e89212b085c03462c042925a1f52d3f2/util/clusterauth/clusterauth.go#L244