kepler: Unable to deploy Kepler in GKE with Helm chart if nodes are Container-Optimized OS (COS)
What happened?
If we use the Helm chart to deploy kepler, Kepler-exporter stays in status “ContainerCreating” if the nodes of the GKE (Google Kubernetes Engine) cluster are COS-containerd. (COS=Container-Optimized OS) because the chart tries to mount a volume in /usr/src as shown below. We can deploy kepler manually with the manifests and kubectl although we have to deploy Prometheus operator before kepler-exporter or we get the error: “ensure CRDs are installed first”.
The Helm chart works smoothly if the GKE nodes are Ubuntu-containerd
After typing:
kubectl -n kepler get pods
kubectl -n kepler describe pod <a_pod_from_previous_command>
we get something like this:
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType: DirectoryOrCreate
tracing:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType: Directory
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType: Directory
usr-src:
Type: HostPath (bare host directory volume)
Path: /usr/src
HostPathType: Directory
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36m default-scheduler Successfully assigned kepler/kepler-5rptn to gke-tf-poc-79r5-tf-poc-grackle-493f7a92-8nz5
Warning FailedMount 25m kubelet Unable to attach or mount volumes: unmounted volumes=[usr-src], unattached volumes=[lib-modules tracing proc usr-src kube-api-access-djkwx]: timed out waiting for the condition
Warning FailedMount 5m53s (x23 over 36m) kubelet MountVolume.SetUp failed for volume "usr-src" : hostPath type check failed: /usr/src is not a directory
This means that the container did not mount the volume associated to usr-src because this directory does not exist in COS, only in Ubuntu (20.04+ LTS)
In Ubuntu:
juan@node:$ ls /usr/src
linux-gcp-5.19-headers-5.19.0-1026
linux-headers-5.19.0-1026-gcp
What did you expect to happen?
A smooth and complete deployment of kepler-exporter. 😃
How can we reproduce it (as minimally and precisely as possible)?
- Create a GKE cluster ( version 1.25+ ) in GCP with COS nodes
- Get the kubeconfig or the credentials to connect to the cluster with a command like
gcloud container clusters get-credentials <cluster_name> --zone=<cluster_gcp_zone>
- Follow the installation process with the Helm chart in the Kepler documentation:
helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart
helm install kepler kepler/kepler --namespace kepler --create-namespace
- Check the status of the pod with
kubectl -n kepler get pods
kubectl -n kepler describe pod <a_pod_from_previous_command>
or get the logs withkubectl -n kepler get events --sort-by=.metadata.creationTimestamp
You will find the message in the pod description:Unable to attach or mount volumes: unmounted volumes=[usr-src], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
Anything else we need to know?
Just in case, GKE deploys Ubuntu 22.04 LTS if GKE version is at least 1.25.8-gke.500 in Regular channel The cgroups in these nodes are v2 (cgroup2fs) :
juan@node:$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
Kepler image tag
$ helm search repo kepler
NAME CHART VERSION APP VERSION DESCRIPTION
kepler/kepler 0.4.2 release-0.5 A Helm chart for kepler (Kubernetes-based Effic...
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-15T02:15:11Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2-gke.1200", GitCommit:"5319597f0ffe6e93e83a51e280d81fb2028bf4a0", GitTreeState:"clean", BuildDate:"2023-06-01T19:54:16Z", GoVersion:"go1.20.4 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version --short
Client Version: v1.27.3
Kustomize Version: v5.0.1
Server Version: v1.27.2-gke.1200
Cloud provider or bare metal
OS version
# On Linux:
$ cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=3f78393e3888bca4fdd7aae9d405f27017334f87
VERSION=105
VERSION_ID=105
BUILD_ID=17412.101.13
$ uname -a
Linux gke-tf-poc-7cxy-tf-poc-wasp-85ae4487-p6lh 5.15.109+ #1 SMP Sat May 20 10:48:19 UTC 2023 x86_64 Intel(R) Xeon(R) CPU @ 2.20GHz GenuineIntel GNU/Linux
Install tools
Kepler deployment config
For on kubernetes:
$ KEPLER_NAMESPACE=kepler
# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE}
Error from server (NotFound): configmaps "kepler-cfm" not found
# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE}
Error from server (NotFound): deployments.apps "kepler-exporter" not found
For standalone:
put your Kepler command argument here
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (9 by maintainers)
@juangascon we’ll fix both the kepler manifests and chart