microk8s: [BUG] Microk8s crashes when joining a node using ha-cluster

I’m not sure if the problem occurs because my master node is an Ubuntu machine and the worker is Windows 10 Enterprise (WSL enabled), but I thought this might be of interest.

Version: 1.19/stable

Steps to reproduce:

Checking previously with microk8s status and microk8s inspect before joining the cluster, everything seems to be fine.
Add-On ha-cluster is enabled on both the master and worker node.
Running microk8s join x.x.x.x:25000/{TOKEN} makes microk8s crash silently.

No error message is output.

Output of microk8s status before joining:

microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
addons:
  enabled:
    ha-cluster           # Configure high availability on the current node
  disabled:
    ambassador           # Ambassador API Gateway and Ingress
    cilium               # SDN, fast with full network policy
    dashboard            # The Kubernetes dashboard
    dns                  # CoreDNS
    fluentd              # Elasticsearch-Fluentd-Kibana logging and monitoring
    gpu                  # Automatic enablement of Nvidia CUDA
    helm                 # Helm 2 - the package manager for Kubernetes
    helm3                # Helm 3 - Kubernetes package manager
    host-access          # Allow Pods connecting to Host services smoothly
    ingress              # Ingress controller for external access
    istio                # Core Istio service mesh services
    jaeger               # Kubernetes Jaeger operator with its simple config
    knative              # The Knative framework on Kubernetes.
    kubeflow             # Kubeflow for easy ML deployments
    linkerd              # Linkerd is a service mesh for Kubernetes and other frameworks
    metallb              # Loadbalancer for your Kubernetes cluster
    metrics-server       # K8s Metrics Server for API access to service metrics
    multus               # Multus CNI enables attaching multiple network interfaces to pods
    prometheus           # Prometheus operator for monitoring and logging
    rbac                 # Role-Based Access Control for authorisation
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory

Output of microk8s inspect before joining:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

Output of join (finishes without further output):

Contacting cluster at 10.10.40.24
Waiting for this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..

Output of microk8s status after joining:

microk8s is not running. Use microk8s inspect for a deeper inspection.

Output of microk8s inspect after joining:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
 FAIL:  Service snap.microk8s.daemon-apiserver is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-apiserver
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

Building the report tarball
  Report tarball is at /var/snap/microk8s/1791/inspection-report-20201211_113621.tar.gz
An error occurred when trying to execute 'sudo microk8s.inspect' with 'multipass': returned exit code 1.

And as you can image, the node is not added on the master node.

I reinstalled microk8s & removed the VM. Then everything seems to be fine again, and after trying to join microk8s crashes again.

FAIL: Service snap.microk8s.daemon-apiserver is not running

Approximately 15 minutes later, microk8s seemed to be up running again (but the api-server was still down). After trying again to join the cluster, I’ve received a python stacktrace. Maybe just because the api-server was down, but I thought I append this here just in case.

Contacting cluster at 10.10.40.24
Traceback (most recent call last):
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 967, in <module>
    join_dqlite(connection_parts)
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 900, in join_dqlite
    update_dqlite(info["cluster_cert"], info["cluster_key"], info["voters"], hostname_override)
  File "/snap/microk8s/1791/scripts/cluster/join.py", line 818, in update_dqlite
    with open("{}/info.yaml".format(cluster_backup_dir)) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/var/snap/microk8s/1791/var/kubernetes/backend.backup/info.yaml'
An error occurred when trying to execute 'sudo microk8s.join 10.10.40.24:25000/04f6ac0ea469893c594e5b30954618f0' with 'multipass': returned exit code 1.

NOTE: Resolved in the meantime by disabling the add-on ha-cluster on both nodes. Would be great if this issue could be fixed soon!

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 24

Most upvoted comments

Ended up using Kubernetes natively and now everything seems fine.

wsdt on Dec 26, 2020

@devZer0 can you upload the inspect tarball? Thanks

balchua on Mar 17, 2021