argo-cd: Unable to add 1.24.0 Kubernetes cluster

Checklist:

  • I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

Describe the bug

When I tried to add a freshly created v1.24.0 Kubernetes cluster to argocd, I got timeout (see the Logs for details). The cluster can’t be added.

Then I created a fresh v1.23.6 cluster, I can add it successfully.

I’m using kubeadm to create my Kubernetes clusters. The only difference between the two creation is one single parameter passed to kubeadm init, which is --kubernetes-version.

Version

argocd: v2.3.3+07ac038
  BuildDate: 2022-03-30T01:46:59Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: clean
  GoVersion: go1.17.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.3.3+07ac03

Logs

INFO[0001] ServiceAccount "argocd-manager" already exists in namespace "kube-system" 
INFO[0001] ClusterRole "argocd-manager-role" updated    
INFO[0002] ClusterRoleBinding "argocd-manager-role-binding" updated 
FATA[0032] Failed to wait for service account secret: timed out waiting for the condition

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 23 (9 by maintainers)

Most upvoted comments

Just want to share my (hacky) work around on this.

  1. Create a service account token Secret in the kube-system namespace, making sure that the annotation refers to the argocd-manager service account;
apiVersion: v1
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: argocd-manager
  name: argocd-manager-token
  namespace: kube-system
type: kubernetes.io/service-account-token
  1. Yes, kubernetes 1.24 populates data into the newly created secret;
  2. But the secret is not associated with the sa, the sa still has 0 secrets;
root@ip-192-168-1-38:~# kubectl get sa -n kube-system
NAME                                 SECRETS   AGE
argocd-manager                       0         5d4h
  1. I did kubectl edit sa -n kube-system argocd-manager to manually add the secret to the service account:
secrets:
- name: argocd-manager-token
  1. Now the service account has 1 secret;
  2. And I can add the 1.24.0 cluster now.
root@ip-172-31-55-65:~# argocd cluster add --kubeconfig ./config_kyst_us-west-1 kyst-backend-us-west-1
WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `kyst-backend-us-west-1` with full cluster level admin privileges. Do you want to continue [y/N]? y
INFO[0002] ServiceAccount "argocd-manager" already exists in namespace "kube-system" 
INFO[0002] ClusterRole "argocd-manager-role" updated    
INFO[0002] ClusterRoleBinding "argocd-manager-role-binding" updated 
FATA[0032] Failed to wait for service account secret: timed out waiting for the condition 
root@ip-172-31-55-65:~# argocd cluster add --kubeconfig ./config_kyst_us-west-1 kyst-backend-us-west-1
WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `kyst-backend-us-west-1` with full cluster level admin privileges. Do you want to continue [y/N]? y
INFO[0001] ServiceAccount "argocd-manager" already exists in namespace "kube-system" 
INFO[0001] ClusterRole "argocd-manager-role" updated    
INFO[0001] ClusterRoleBinding "argocd-manager-role-binding" updated 
Cluster 'https://<hide-my-ip-here>:6443' added

With that, to fix this by the ‘short term solution’, we may need to not only create a service account token Secret, but also add the secret to the argocd-manager service account.

I do confirm . Issue appeared in OCP 4.11 which is based on kubernetes 1.24 I would say this is a bug in kubernetes. Because I can see this behavior is broken with prometheus in openshift . i.e. oc sa get-token prometheus-k8s -n openshift-monitoring did not work as well.

So this means that parsing the token of an SA has been changed since k8s 1.24

resolution

ns=kube-system
sa_token=$(kubectl -n $ns get secret | grep argocd-manager-token | awk '{print $1}')
kubectl -n $ns patch sa argocd-manager -p '{"secrets": [{"name": "'"${sa_token}"'"}]}'
# then run "argocd cluster add" command again

I experienced this issue on Argo CD v2.7.2

The workaround was as described above in two separate posts.

For completeness here is my solution.

My context was for local testing multiple clusters

  • docker desktop k8s with Argo CD installed
  • kind dev cluster also locally

Steps to solve

Create a kind cluster with an apiServerAddress that is accessible for your Argo CD instance (not localhost). Most likely your IP “192.x.x.x:8443”

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: dev-cluster
networking:
  # WARNING: It is _strongly_ recommended that you keep this the default
  # (127.0.0.1) for security reasons. However it is possible to change this.
  apiServerAddress: "<your-local-ip>"
  # By default the API server listens on a random open port.
  # You may choose a specific port but probably don't need to in most cases.
  # Using a random port makes it easier to spin up multiple clusters.
  apiServerPort: 8443

kind docs ref

kind create cluster --config config.yaml

Run the argocd command to add a cluster

argocd cluster add kind-dev-cluster

It will fail with a timeout. That’s when have to switch to the kind dev cluster context and create the additional secret for the service account and associate the argocd-manager service account with the new secret.

In your dev-cluster context

kubectl config use-context kind-dev-cluster

Create service account secret

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  annotations:
    kubernetes.io/service-account.name: argocd-manager
  name: argocd-manager-token
  namespace: kube-system

Add secret to service account

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2023-10-10T15:02:41Z"
  name: argocd-manager
  namespace: kube-system
  resourceVersion: "1526"
  uid: 89721095-63b2-42d0-8dd9-29c2f9fe0379
secrets:
- name: argocd-manager-token

The fix was released in 2.3.7 and 2.4.0 onward.

It turns out the TokenRequest API is pretty straight forward to use. Here’s a hacky WIP commit to show what it looks like. I have tried both approaches (creating secret and using token request API), and the TokenRequest API seems to resolve the issue. Still need to work through the best approach for maintaining backwards compatibility with the Secret approach for older versions of k8s.