rook: Cannot run cephcsi in mix architecture kubernetes cluster
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
When deploying new CephCluster with latest Rook operator installed by helm chart on ARM64 (Raspberry PI 4, Ubuntu 18.04), the CSI plugins fail to run with standard_init_linux.go:211: exec user process caused "exec format error". Apparently, there are no arm64 images for these plugins in the quay.io repository. Therefore, csi plugins have to be disabled, or rook downgraded, to make the persistence work properly
Expected behavior: The ceph cluster should spawn properly, the arm64 images should be downloaded for all components
How to reproduce it (minimal and precise):
Have Rook operator deployed properly in ARM64 kubernetes cluster
Do kubectl apply -f cluster.yaml on your cluster.yaml.
Watch the plugin pods spawning and crashing into crashloopbackoff
File(s) to submit:
- Cluster CR (custom resource), typically called
cluster.yaml:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
allowUnsupported: false
dataDirHostPath: /storage
skipUpgradeChecks: false
mon:
count: 1
allowMultiplePerNode: true
dashboard:
enabled: true
ssl: true
monitoring:
enabled: false
rulesNamespace: rook-ceph
network:
hostNetwork: false
rbdMirroring:
workers: 0
storage:
useAllNodes: false
useAllDevices: false
nodes:
- name: "node1"
directories:
- path: /storage
Environment:
- OS:
Ubuntu 18.04 aarch64 - Kernel: custom
4.19.76kernel compiled with rbd module - Cloud provider or hardware configuration: Raspberry Pi 4
- Rook version: v1.1.2
- Storage backend version: ceph/ceph:v14.2.4-20190917
- Kubernetes version: v1.16.0
- Kubernetes cluster type: Kubespray
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 39 (22 by maintainers)
@Madhu-1’s images worked very well for me on my Pi 4 cluster, but the arch-specific tag approach would be difficult to handle in the case of a multi architecture cluster (AFAIK you can’t set
NodeSelectors on the sets the operator makes). I’ve rebuilt the latest CSI images from the official GitHub repos usingdocker buildxand published them on Docker Hub under a multiarch manifest.As for the cephcsi images they’re a copy of the ones on Quay.io, but with the arch tags merged to one.
(Edit: This has been automated into the Raspbernetes multi-arch-images project. You should probably use them instead of mine.)
(As a heads up I haven’t tested these on amd64 yet, but I plan to! They’re running on my Pi 4s okay)
There’s movement to get the official CSI images done this way - watch this space: https://github.com/kubernetes-csi/external-attacher/pull/224
If it’s of interest to anyone (I forgot to update my comment), I’ve added multiarch images to the Raspbernetes collection of Docker images. These are all true multiarch and should automatically build new releases until the upstream sources release these directly.
@Weizhuo-Zhang this will save you needing to use unversioned images from a number of sources. 🙂
external csi sidecar containers are not arm64 compatible there is a work going on in kubernetes-csi repo, if you want to try things in arm64 you need to build one.
i built sidecar images for arm64,if you want to try CSI on arm64 you can use this one
Reopening
closing this one as the support for multi-arch is fixed now in https://github.com/ceph/ceph-csi/pull/1241, canary image is available at https://quay.io/repository/cephcsi/cephcsi?tab=tags
Would it be possible to keep this issue open until we are able to build a multi arch image? It’s hard to integrate rook-ceph into a multiarch Kubernetes cluster without this.
@jamesorlakin can you please an issue with the kubernetes-csi repo?
created an tracker issue in cephcsi https://github.com/ceph/ceph-csi/issues/1003
Is there any documentation on how to get rook running on an aarch64 cluster. My environment is as follows: Hardware: 5x Jetson Nano (Ubuntu 18.04 Linux4Tegra)
I run kubernetes on this using the k3s distribution from rancher. This works pretty good with docker as a backend to get GPU support in my PODs.
I installed rook via the standard: rook/cluster/examples/kubernetes/ceph/common.yaml rook/cluster/examples/kubernetes/ceph/operator.yaml rook/cluster/examples/kubernetes/ceph/cluster.yaml
with very little adjustments (only added a device filter for sda)
This gives me a running ceph cluster with 1.2 TB storage (5x 256GB Sandisk Pro USB Stick)
But all the csi-*plugin pods fail
so I updated the operator.yaml file to explicitly use
this got me one step closes but the registrar image still seemed to be amd64 so I also added this (which I found in another bugreport)
This gave me running csi-*plugin pods, but there are still failing pods for csi-*plugin-provisioner
If I describe the pods they are still using
Image: quay.io/cephcsi/cephcsi:v1.2.2So does anyone have a good description how to get rook with csi running on a aarm64 cluster?
For what is worth, here is the output from
/proc/cpuinfoI am happy for any hints on how to get this running. Thanks a lot
@cyb70289 it might be possible for us to get an arm64 hardware node for CI via the CNCF and/or Packet.
If we did, could we plumb the node for testing rook ARM deployments to get around the travis limits?
@cyb70289 The easiest way to perform the same build on multiple architectures is to embed the entire build process in a Dockerfile, then use multi-architecture Docker builds. There’s a demonstration of two ways to do it at https://github.com/cjyar/docker-multi-arch/tree/buildx.
I’d like to help move this forward, but I don’t want to step on anybody’s toes. I thought I might start with a PR for
ceph/ceph(orceph/ceph-containeractually) to do multi-architecture builds this way, instead of however they’re doing it now.ceph-csiis based on this image.@billimek , we see exactly the same problem, and are planning to add arm64 support to all these images. Definitely needs community help to make it happen.