cloudstack: Kubernetes cluster sometimes end up in Error state: fails to attach Binary ISO

ISSUE TYPE
  • Bug Report
COMPONENT NAME
CKS
CLOUDSTACK VERSION
4.18
CONFIGURATION

CKS 1.24.0 and 1.27.3

(also 1.26.0, I experienced the same problems, but I did not include it in this bug report,)

I used 6 different setups using 2 service offerings with the 2 CKS-versions specified above. Every setup had 1 controller node and the default Kubernetes isolated network offering.

SO1 (small):

  • 2 CPU cores
  • 2 GB RAM
  • 8 GB root disk

SO2 (big):

  • 4 CPU cores
  • 16 GB RAM
  • 64 GB root disk
test-124-small

1 worker
SO 1
CKS 1.24.0
test-124-small-many

10 workers
SO 1
CKS 1.24.0
test-1273-small

1 worker
SO 1
CKS 1.27.3
test-1273-small-many

10 workers
SO 1
CKS 1.27.3
test-1273-big

1 worker
SO 2
CKS 1.27.3
test-1273-big-many

10 workers
SO 2
CKS 1.27.3
OS / ENVIRONMENT

Ubuntu 22.04 nodes running KVM

SUMMARY

When creating Kubernetes clusters, it fails some of the times. It appears as if it is more common when creating multiple nodes instead of just a few.

I can access the nodes in the failed clusters through ssh just fine and I am able to look at the logs, such as /var/log/daemon.log.

In the test-1273-big-many-node1 the entries in daemon.log indicates that the issue is related to some binary ISO not being attached.

Aug  8 09:30:21 systemvm cloud-init[1102]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 15 seconds, attempt: 99
Aug  8 09:30:36 systemvm cloud-init[1102]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 15 seconds, attempt: 100
Aug  8 09:30:51 systemvm cloud-init[1102]: Warning: Offline install timed out!

which I assume could cause the following entry:

Aug  8 09:30:52 systemvm deploy-kube-system[1420]: /opt/bin/deploy-kube-system: line 19: kubeadm: command not found

But that is not the case for every failed cluster, such as test-small-many, actually succeeds on some nodes:

Aug  8 09:43:10 systemvm cloud-init[1070]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 8
Aug  8 09:43:41 systemvm cloud-init[1070]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 9
Aug  8 09:44:11 systemvm cloud-init[1070]: Installing binaries from /mnt/k8sdisk/

But fails on the 5th node out of 10 total with the same error

Aug  8 10:00:16 systemvm cloud-init[1072]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 39
Aug  8 10:00:46 systemvm cloud-init[1072]: Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 40
Aug  8 10:01:16 systemvm cloud-init[1072]: Warning: Offline install timed out!

with the kubectl output:

➜  ~ kubectl get nodes                                      
NAME                                       STATUS   ROLES           AGE    VERSION
test-1273-small-many-control-189d4833e43   Ready    control-plane   3h7m   v1.27.3
test-1273-small-many-node-189d483a95b      Ready    <none>          3h7m   v1.27.3
test-1273-small-many-node-189d484150e      Ready    <none>          3h7m   v1.27.3
test-1273-small-many-node-189d4847014      Ready    <none>          3h7m   v1.27.3
test-1273-small-many-node-189d484b721      Ready    <none>          3h7m   v1.27.3
....
should be 6 more!

All logs can be found at the bottom of this issue

STEPS TO REPRODUCE
1. Create a cluster with one of the configurations above
2. Wait for it to be created
3. Check if it worked or not
EXPECTED RESULTS
Cluster ends up in Running state 
ACTUAL RESULTS
Cluster ends up in Error state

dashboard

LOGS

I supplied logs from /var/log/daemon.log. I added logs from the controller and the first worker.

For the clusters with more than 1 worker, I added the logs for the first worker that failed as well.

SUCCEEDED test-124-small-controller.txt test-124-small-node1.txt

FAILED test-124-small-many-controller.txt test-124-small-many-node1.txt test-124-small-many-node2.txt

FAILED test-1273-small-controller.txt test-1273-small-node1.txt

FAILED test-1273-small-many-controller.txt test-1273-small-many-node1.txt test-1273-small-many-node5.txt

SUCCEEDED test-1273-big-controller.txt test-1273-big-node1.txt

FAILED test-1273-big-many-controller.txt test-1273-big-many-node1.txt (failed on the first node)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 28 (17 by maintainers)

Most upvoted comments

Hi,

I believe we have found the issue, and I would just like to report what happened.

In the tests I used 1.24.0 and 1.27.3 Kubernetes versions.

These existed in two secondary storages such as:

  • Storage 1: 1.24.0
  • Storage 2: 1.24.0 & 1.27.3

The issue was the Storage 1 was behaving incorrectly, either cause of permissions or networking issues, which caused a node to the stuck in the mounting process if it decided to use that storage (which according to @weizhouapache above was a random process)

So, when I tried to create clusters with 1.24.0 in my tests, those hypervisors that decided to bind the 1.24.0 from storage 1 would get stuck and therefore, until it was restarted or unmounted, any other cluster that needed the binaries to mount on that hypervisor could fail. Hence the reason it could fail on other version that did not exist on the faulty secondary storage.

The solution was to remove the bad secondary storage and make sure to unmount it on every hypervisor.

Now, I do believe this is users’ fault and that NFS should be assumed to be correctly configured, but perhaps we should look over some sort solution where I could have been informed, such as a log entry in the management server “Failed to mount ISO on hypervisor. Server <address> might be misbehaving”.

Anyhow, we can close this issue, and I’d like to thank you guys for all the help. While it was not an issue with CloudStack, it gave a great insight to the internal machinery and we’ll be able to troubleshoot much easier in the future.

@saffronjam

As I understand it, the ISO get mounted temporarily on the hypervisor as long as some cluster node needs it?

Yes, that is correct. This is unmounted as soon as they are detached from the Virtual Machines.

However, for some hypervisors I was unable to check this since the command df -Th | grep nfs got stuck (gave me no output and forced me to CTRL + C), same with ls /mnt. This coincidentally correlates to which cluster node that either fails because of too many attempts to wait for /mnt/k8sdisk/ or just takes longer time to mount it (the node was se-flem-016 just for reference).

Just like I expected. Sometimes it is hard to deal with NFS.

I tried to reboot the 016-hypervisor and I am now able to run df -Th | grep nfs.

You should have done # umount -l <path> and the path from # mount command instead of a reboot and checked the problem. It does the “lazy” unmount.

@weizhouapache This leaves me with a question though. If the binaries fails to attach for any reason, how do we “reset” that cluster node so it tries to attach the ISO again?

Please check if you can try running # systemctl start setup-kube-system && systemctl start deploy-kube-system. The former should attempt to re-attach the ISO and the latter will proceed with the deployment.

Quick update: I was able to reproduce the issue, and the failing node was run on a hypervisor that can’t run the command df -Th | grep nfs as described above. I will try to find some logs here.

Please inspect # dmesg -T or rsyslog. You’ll find hints there.

Thanks, Jayanth

  1. @weizhouapache How does ACS find which secondary storage to mount? Does it mount random secondary storage servers or has some order?

@zap51 ACS randomly select a secondary storage and mount the ISO on it to the k8s nodes. from what @saffronjam shared, the ISO has been attached to k8s nodes successfully, but it works on some nodes and does not work on others. maybe better to double check if the working/failed nodes use the same ISO.

@weizhouapache Right, thanks! The ISO should be /dev/sr0 on the master and worker nodes then. @saffronjam would you also please confirm if the device of type iso* is available on the worker and master nodes that are failing to find binaries? I doubt if the device itself is available on the nodes.

Thanks, Jayanth

CKS gets the device name of the attached ISO by command blkid -o device -t TYPE=iso9660 (run sudo -i at first) It is possible that the ISO is mounted but the above command return empty result. would be good to check by blkid -o device -t LABEL=CDROM (see #7650 )

  1. @weizhouapache How does ACS find which secondary storage to mount? Does it mount random secondary storage servers or has some order?

@zap51 ACS randomly select a secondary storage and mount the ISO on it to the k8s nodes. from what @saffronjam shared, the ISO has been attached to k8s nodes successfully, but it works on some nodes and does not work on others. maybe better to double check if the working/failed nodes use the same ISO.

I’ll try to reproduce this issue and come back. @weizhouapache has provided the patch to only consider images / ISOs if the secondary storage has the image in READY state but it still doesn’t work.

@saffronjam would you be able to check if the binary ISOs present on the secondary storage match on all the NFS shares (RW and RO), perhaps a checksum would be good. Also see if you can send the output of # journalctl -u setup-kube-system.service --no-pager if feasible.

Also, do we happen to see any network issues by any chance between the hypervisors and the NFS server? I haven’t took a total look on the logs, I’ll come back. As far as I see, this has something to do with the binary images but not sure.

Thanks, Jayanth

@zap51 good point, it might be possible that the ISO is Ready (CloudStack state) on the secondary storages but actually not

I created a new cluster to test this with 1 + 10 nodes. In this cluster,

  1. I can find the “org.apache.cloudstack.storage.command.AttachCommand” entries for every node. Even the failed ones

  2. The logs and the CloudStack GUI reports the jobs being finished (both attach and detach jobs):

Aug 14 12:27:37 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Attached binaries ISO for VM : test-cluster-control-189f3756fa2 in cluster: test-cluster
Aug 14 12:27:37 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Attached binaries ISO for VM : test-cluster-node-189f375d12e in cluster: test-cluster
... same for the nodes in between
Aug 14 12:27:43 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Attached binaries ISO for VM : test-cluster-node-189f3787958 in cluster: test-cluster
Aug 14 12:27:43 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Attached binaries ISO for VM : test-cluster-node-189f378ccf0 in cluster: test-cluster
Aug 14 12:56:11 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Detached Kubernetes binaries from VM : test-cluster-control-189f3756fa2 in the Kubernetes cluster : test-cluster
Aug 14 12:56:12 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Detached Kubernetes binaries from VM : test-cluster-node-189f375d12e in the Kubernetes cluster : test-cluster
... same for the nodes in between
Aug 14 12:56:54 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Detached Kubernetes binaries from VM : test-cluster-node-189f3787958 in the Kubernetes cluster : test-cluster
Aug 14 12:56:56 se-flem-001 java[595738]: INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Detached Kubernetes binaries from VM : test-cluster-node-189f378ccf0 in the Kubernetes cluster : test-cluster

I’ve tried to find relevant logs after the Attach and Detach jobs completes, but I only find one log entry after the last detach saying that the job failed:

2023-08-14 12:56:56,831 INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-14:ctx-63b2bc93 job-26834 ctx-15a4fa84) (logid:bc9fab70) Detached Kubernetes binaries from VM : test-cluster-node-189f378ccf0 in the Kubernetes cluster : test-cluster
2023-08-14 12:56:56,832 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-14:ctx-63b2bc93 job-26834) (logid:bc9fab70) Complete async job-26834, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to setup Kubernetes cluster : test-cluster in usable state as unable to provision API endpoint for the cluster"}

So it seems as we get a timeout (Waiting for Binaries directory /mnt/k8sdisk/ to be available, sleeping for 15 seconds, attempt: 100) even if the logs says the attach/detach jobs runs

I’ll try to reproduce this issue and come back. @weizhouapache has provided the patch to only consider images / ISOs if the secondary storage has the image in READY state but it still doesn’t work.

@saffronjam would you be able to check if the binary ISOs present on the secondary storage match on all the NFS shares (RW and RO), perhaps a checksum would be good. Also see if you can send the output of # journalctl -u setup-kube-system.service --no-pager if feasible.

Also, do we happen to see any network issues by any chance between the hypervisors and the NFS server? I haven’t took a total look on the logs, I’ll come back. As far as I see, this has something to do with the binary images but not sure.

Thanks, Jayanth

We don’t have a test-setup running, is there any guide or instructions on how to we can do this?

We are running a management server with a database on a single node, then agents on a few other nodes. All installed with apt.

Do we shutdown the current management server, build your changes into jar-files and then try to start that using the same settings and database etc.?

@saffronjam if you use 4.18.0.0, here are the steps

// on any other server
git clone https://github.com/apache/cloudstack
cd cloudstack
git checkout 4.18.0.0
wget https://raw.githubusercontent.com/apache/cloudstack/737ac561f8b6dd1cefd17c2fdf82b5846aea20bf/server/src/main/java/com/cloud/template/TemplateManagerImpl.java -O server/src/main/java/com/cloud/template/TemplateManagerImpl.java
mvn -P developer,systemvm clean install
// copy client/target/cloud-client-ui-4.18.0.0.jar to /usr/share/cloudstack-management/lib/cloudstack-4.18.0.0.jar on management server (backup at first)
// restart cloudstack-management service on management server