rook: Cannot run cephcsi in mix architecture kubernetes cluster

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior: When deploying new CephCluster with latest Rook operator installed by helm chart on ARM64 (Raspberry PI 4, Ubuntu 18.04), the CSI plugins fail to run with standard_init_linux.go:211: exec user process caused "exec format error". Apparently, there are no arm64 images for these plugins in the quay.io repository. Therefore, csi plugins have to be disabled, or rook downgraded, to make the persistence work properly

Expected behavior: The ceph cluster should spawn properly, the arm64 images should be downloaded for all components

How to reproduce it (minimal and precise): Have Rook operator deployed properly in ARM64 kubernetes cluster Do kubectl apply -f cluster.yaml on your cluster.yaml. Watch the plugin pods spawning and crashing into crashloopbackoff

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    allowUnsupported: false
  dataDirHostPath: /storage
  skipUpgradeChecks: false
  mon:
    count: 1
    allowMultiplePerNode: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: false
    rulesNamespace: rook-ceph
  network:
    hostNetwork: false
  rbdMirroring:
    workers: 0
  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
    - name: "node1"
      directories:
      - path: /storage

Environment:

  • OS: Ubuntu 18.04 aarch64
  • Kernel: custom 4.19.76 kernel compiled with rbd module
  • Cloud provider or hardware configuration: Raspberry Pi 4
  • Rook version: v1.1.2
  • Storage backend version: ceph/ceph:v14.2.4-20190917
  • Kubernetes version: v1.16.0
  • Kubernetes cluster type: Kubespray

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 39 (22 by maintainers)

Most upvoted comments

@Madhu-1’s images worked very well for me on my Pi 4 cluster, but the arch-specific tag approach would be difficult to handle in the case of a multi architecture cluster (AFAIK you can’t set NodeSelectors on the sets the operator makes). I’ve rebuilt the latest CSI images from the official GitHub repos using docker buildx and published them on Docker Hub under a multiarch manifest.

As for the cephcsi images they’re a copy of the ones on Quay.io, but with the arch tags merged to one.

(Edit: This has been automated into the Raspbernetes multi-arch-images project. You should probably use them instead of mine.)

ROOK_CSI_CEPH_IMAGE: "jamesorlakin/multiarch-cephcsi:2.1.0"

ROOK_CSI_RESIZER_IMAGE: "jamesorlakin/multiarch-csi-resizer:0.5.0"
ROOK_CSI_REGISTRAR_IMAGE: "jamesorlakin/multiarch-csi-node-driver-registrar:1.3.0"
ROOK_CSI_PROVISIONER_IMAGE: "jamesorlakin/multiarch-csi-provisioner:1.6.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "jamesorlakin/multiarch-csi-snapshotter:2.1.1"
ROOK_CSI_ATTACHER_IMAGE: "jamesorlakin/multiarch-csi-attacher:2.1.0"

(As a heads up I haven’t tested these on amd64 yet, but I plan to! They’re running on my Pi 4s okay)

There’s movement to get the official CSI images done this way - watch this space: https://github.com/kubernetes-csi/external-attacher/pull/224

If it’s of interest to anyone (I forgot to update my comment), I’ve added multiarch images to the Raspbernetes collection of Docker images. These are all true multiarch and should automatically build new releases until the upstream sources release these directly.

@Weizhuo-Zhang this will save you needing to use unversioned images from a number of sources. 🙂

external csi sidecar containers are not arm64 compatible there is a work going on in kubernetes-csi repo, if you want to try things in arm64 you need to build one.

i built sidecar images for arm64,if you want to try CSI on arm64 you can use this one

madhupr001/csi-resizer:v0.4.0-arm64
madhupr001/csi-provisioner:v1.4.0-arm64
madhupr001/csi-attacher:v1.2.1-arm64
madhupr001/csi-snapshotter:v1.2.0-arm64

Reopening

closing this one as the support for multi-arch is fixed now in https://github.com/ceph/ceph-csi/pull/1241, canary image is available at https://quay.io/repository/cephcsi/cephcsi?tab=tags

Would it be possible to keep this issue open until we are able to build a multi arch image? It’s hard to integrate rook-ceph into a multiarch Kubernetes cluster without this.

@jamesorlakin can you please an issue with the kubernetes-csi repo?

created an tracker issue in cephcsi https://github.com/ceph/ceph-csi/issues/1003

Is there any documentation on how to get rook running on an aarch64 cluster. My environment is as follows: Hardware: 5x Jetson Nano (Ubuntu 18.04 Linux4Tegra)

I run kubernetes on this using the k3s distribution from rancher. This works pretty good with docker as a backend to get GPU support in my PODs.

I installed rook via the standard: rook/cluster/examples/kubernetes/ceph/common.yaml rook/cluster/examples/kubernetes/ceph/operator.yaml rook/cluster/examples/kubernetes/ceph/cluster.yaml

with very little adjustments (only added a device filter for sda)

This gives me a running ceph cluster with 1.2 TB storage (5x 256GB Sandisk Pro USB Stick)

But all the csi-*plugin pods fail

so I updated the operator.yaml file to explicitly use

- name: ROOK_CSI_CEPH_IMAGE
          value: "quay.io/cephcsi/cephcsi:v2.0.0-arm64"

this got me one step closes but the registrar image still seemed to be amd64 so I also added this (which I found in another bugreport)

- name: ROOK_CSI_REGISTRAR_IMAGE
          value: "colek42/csi-node-driver-registrar"

This gave me running csi-*plugin pods, but there are still failing pods for csi-*plugin-provisioner

If I describe the pods they are still using Image: quay.io/cephcsi/cephcsi:v1.2.2

So does anyone have a good description how to get rook with csi running on a aarm64 cluster?

For what is worth, here is the output from /proc/cpuinfo

processor       : 0
model name      : ARMv8 Processor rev 1 (v8l)
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1

I am happy for any hints on how to get this running. Thanks a lot

@cyb70289 it might be possible for us to get an arm64 hardware node for CI via the CNCF and/or Packet.

If we did, could we plumb the node for testing rook ARM deployments to get around the travis limits?

@cyb70289 The easiest way to perform the same build on multiple architectures is to embed the entire build process in a Dockerfile, then use multi-architecture Docker builds. There’s a demonstration of two ways to do it at https://github.com/cjyar/docker-multi-arch/tree/buildx.

I’d like to help move this forward, but I don’t want to step on anybody’s toes. I thought I might start with a PR for ceph/ceph (or ceph/ceph-container actually) to do multi-architecture builds this way, instead of however they’re doing it now. ceph-csi is based on this image.

@billimek , we see exactly the same problem, and are planning to add arm64 support to all these images. Definitely needs community help to make it happen.