kepler: Unable to deploy Kepler in GKE with Helm chart if nodes are Container-Optimized OS (COS)

What happened?

If we use the Helm chart to deploy kepler, Kepler-exporter stays in status “ContainerCreating” if the nodes of the GKE (Google Kubernetes Engine) cluster are COS-containerd. (COS=Container-Optimized OS) because the chart tries to mount a volume in /usr/src as shown below. We can deploy kepler manually with the manifests and kubectl although we have to deploy Prometheus operator before kepler-exporter or we get the error: “ensure CRDs are installed first”.

The Helm chart works smoothly if the GKE nodes are Ubuntu-containerd

After typing: kubectl -n kepler get pods kubectl -n kepler describe pod <a_pod_from_previous_command>

we get something like this:

Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  DirectoryOrCreate
  tracing:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
  proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
  usr-src:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/src
    HostPathType:  Directory
(...)
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    36m                   default-scheduler  Successfully assigned kepler/kepler-5rptn to gke-tf-poc-79r5-tf-poc-grackle-493f7a92-8nz5
  Warning  FailedMount  25m                   kubelet            Unable to attach or mount volumes: unmounted volumes=[usr-src], unattached volumes=[lib-modules tracing proc usr-src kube-api-access-djkwx]: timed out waiting for the condition
  Warning  FailedMount  5m53s (x23 over 36m)  kubelet            MountVolume.SetUp failed for volume "usr-src" : hostPath type check failed: /usr/src is not a directory

This means that the container did not mount the volume associated to usr-src because this directory does not exist in COS, only in Ubuntu (20.04+ LTS)

In Ubuntu:

juan@node:$ ls /usr/src
linux-gcp-5.19-headers-5.19.0-1026
linux-headers-5.19.0-1026-gcp

What did you expect to happen?

A smooth and complete deployment of kepler-exporter. 😃

How can we reproduce it (as minimally and precisely as possible)?

  1. Create a GKE cluster ( version 1.25+ ) in GCP with COS nodes
  2. Get the kubeconfig or the credentials to connect to the cluster with a command like gcloud container clusters get-credentials <cluster_name> --zone=<cluster_gcp_zone>
  3. Follow the installation process with the Helm chart in the Kepler documentation: helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart helm install kepler kepler/kepler --namespace kepler --create-namespace
  4. Check the status of the pod with kubectl -n kepler get pods kubectl -n kepler describe pod <a_pod_from_previous_command> or get the logs with kubectl -n kepler get events --sort-by=.metadata.creationTimestamp You will find the message in the pod description: Unable to attach or mount volumes: unmounted volumes=[usr-src], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Anything else we need to know?

Just in case, GKE deploys Ubuntu 22.04 LTS if GKE version is at least 1.25.8-gke.500 in Regular channel The cgroups in these nodes are v2 (cgroup2fs) :

juan@node:$ stat -fc %T /sys/fs/cgroup/
cgroup2fs

Kepler image tag

$ helm search repo kepler
NAME         	CHART VERSION	APP VERSION	DESCRIPTION                                       
kepler/kepler	0.4.2        	release-0.5	A Helm chart for kepler (Kubernetes-based Effic...

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-15T02:15:11Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2-gke.1200", GitCommit:"5319597f0ffe6e93e83a51e280d81fb2028bf4a0", GitTreeState:"clean", BuildDate:"2023-06-01T19:54:16Z", GoVersion:"go1.20.4 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl version --short
Client Version: v1.27.3
Kustomize Version: v5.0.1
Server Version: v1.27.2-gke.1200

Cloud provider or bare metal

Google Cloud Platform (GCP)

OS version

# On Linux:
$ cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=3f78393e3888bca4fdd7aae9d405f27017334f87
VERSION=105
VERSION_ID=105
BUILD_ID=17412.101.13

$ uname -a
Linux gke-tf-poc-7cxy-tf-poc-wasp-85ae4487-p6lh 5.15.109+ #1 SMP Sat May 20 10:48:19 UTC 2023 x86_64 Intel(R) Xeon(R) CPU @ 2.20GHz GenuineIntel GNU/Linux

Install tools

helm

Kepler deployment config

For on kubernetes:

$ KEPLER_NAMESPACE=kepler

# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} 
Error from server (NotFound): configmaps "kepler-cfm" not found

# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} 
Error from server (NotFound): deployments.apps "kepler-exporter" not found

For standalone:

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@juangascon we’ll fix both the kepler manifests and chart