microk8s: root cert expired after a month, cluster does not respond anymore

running microk8s inspect does not work as well as talking to the cluster. error is this: x509: certificate has expired or is not yet valid

How can i renew the root cert?

How can i make it last longer than a month?

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 5
Comments: 21 (2 by maintainers)

Most upvoted comments

The script I have for now is here: https://gist.github.com/ktsakalozos/5de8d4c86c976eeef0242cc39fdf82b2

It would be great if anyone would run it and provide feedback.

curl https://gist.githubusercontent.com/ktsakalozos/5de8d4c86c976eeef0242cc39fdf82b2/raw/f29ff555346435154553d35ff64a8282f867011f/refresh-certs.sh -o refresh.sh
chmod +x refresh.sh
sudo ./refresh.sh

After running the script the pods in the cluster should go into an unknown state and restart after some seconds.

The intention is to place the above script in a microk8s.refresh-certs command to address this issue in affected deployments.

@balchua the kubeconfig files use tokens but they also carry the ca.cert that is why I think they need to be recreated.

+35

ktsakalozos on Apr 28, 2020

I hit upon the same issue just now, had to run refresh.sh and also had to give the coredns pod a kick, thank you @PeterSR for sharing that.

Everything seems to be back to working order, however I cannot pull an image from a private repo now.

  Normal   Scheduled  17m                  default-scheduler  Successfully assigned homelab/newimage-66c8d88f65-lhvdz to kube
  Normal   Pulling    15m (x4 over 17m)    kubelet, kube      Pulling image "registry.gitlab.com/realg/kube/newimage:20.05"
  Warning  Failed     15m (x4 over 17m)    kubelet, kube      Failed to pull image "registry.gitlab.com/realg/kube/newimage:20.05": rpc error: code = Unknown desc = failed to resolve image "registry.gitlab.com/realg/kube/newimage:20.05": no available registry endpoint: failed to fetch anonymous token: unexpected status: 403 Forbidden
  Warning  Failed     15m (x4 over 17m)    kubelet, kube      Error: ErrImagePull
  Normal   BackOff    11m (x21 over 17m)   kubelet, kube      Back-off pulling image "registry.gitlab.com/realg/kube/newimage:20.05"
  Warning  Failed     113s (x65 over 17m)  kubelet, kube      Error: ImagePullBackOff

The image is definitely there, I can pull it with docker from another host using the same dockerconfig.json, I haven’t made any other changes to my cluster so that has me thinking that it’s related to refreshing the expired certs.

Has anyone had the same issue?

realG on May 26, 2020

@PeterSR thank you for the tip yet again! I was indeed missing imagePullSecrets in the deployment yaml.

realG on May 26, 2020

Was facing the same issue. The refresh.sh script worked for me. Afterwards I was facing DNS resolution errors. All services would crash with errors similar to

socket.gaierror: [Errno -3] Temporary failure in name resolution

To save others from 2 hours of debugging: Make sure that coredns has 1/1 ready in kubectl -n kube-system get all. Its readiness probe had failed and logs showed

E0524 09:50:35.607082       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.152.183.1:443/api/v1/endpoints?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid

Deleting the pod (forcing it to restart) solved the issue for me.

Got the idea to check kube-system from here and https://github.com/ubuntu/microk8s/issues/332#issue-413517185.

PeterSR on May 24, 2020

Also noting that the script @ktsakalozos provided fixes the issue for me. Thank you!

bmreading on May 7, 2020

Sorry for the late response. I tested the script above. new certs are valid until the year 2030.

all pods went into unknown state. at around 30 seconds later, all pods went up.

Thanks for the help!

ThomasSchoenbeck on May 4, 2020