kubernetes: How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd")

Hello kubernetes,

I am trying to follow the instructions from the rbd example. After successfully booting a ceph demo cluster (sudo ceph -s on the host displays HEALTH_OK) and manually creating a foo rdb volume formatted in ext4, I cannot start any pod that uses rdb volumes.

The rdb2 pod never starts, it stays in ContainerCreating state, as shown by kubectl get pod output below:

NAME                   READY     STATUS              RESTARTS   AGE
k8s-etcd-127.0.0.1     1/1       Running             0          49m
k8s-master-127.0.0.1   4/4       Running             4          50m
k8s-proxy-127.0.0.1    1/1       Running             0          49m
rbd2                   0/1       ContainerCreating   0          9m

I am using kubernetes 1.2.1 with docker 1.9.1 on ubuntu 14.04 amd64 host using the single-node docker cluster.

The output of kubectl describe pods rbd2 is the following:

Name:       rbd2
Namespace:  default
Node:       127.0.0.1/127.0.0.1
Start Time: Wed, 06 Apr 2016 18:38:22 +0200
Labels:     <none>
Status:     Pending
IP:     
Controllers:    <none>
Containers:
  rbd-rw:
    Container ID:   
    Image:      nginx
    Image ID:       
    Port:       
    QoS Tier:
      cpu:      BestEffort
      memory:       BestEffort
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     False 
Volumes:
  rbdpd:
    Type:       RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:   [172.17.42.1:6789]
    RBDImage:       foo
    FSType:     ext4
    RBDPool:        rbd
    RadosUser:      admin
    Keyring:        /etc/ceph/ceph.client.admin.keyring
    SecretRef:      &{ceph-secret}
    ReadOnly:       true
  default-token-1ze78:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-1ze78
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  7m        7m      1   {default-scheduler }            Normal      Scheduled   Successfully assigned rbd2 to 127.0.0.1
  7m        7s      33  {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1
  7m        7s      33  {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: rbd: failed to modprobe rbd error:exit status 1

In the kubelet docker log, I can see the following trace, repeated multiple times.

I0406 16:44:56.885150    8236 rbd.go:89] ceph secret info: key/AQCyJQVXJV4gERAA1q7y4Wi6MiuO8UahSQoIrg==
I0406 16:44:56.887715    8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
E0406 16:44:57.889282    8236 disk_manager.go:56] failed to attach disk
E0406 16:44:57.889295    8236 rbd.go:208] rbd: failed to setup
E0406 16:44:57.889334    8236 kubelet.go:1780] Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1; skipping pod
E0406 16:44:57.889340    8236 pod_workers.go:138] Error syncing pod fa59e744-fc15-11e5-8533-28d2444cbe8c, skipping: rbd: failed to modprobe rbd error:exit status 1
I0406 16:44:58.884709    8236 nsenter_mount.go:179] Failed findmnt command: exit status 1

As I understand the above logs, the kubelet container is trying to run something like modprobe rbd inside itself (or somewhere else?) and that fails; I noticed that there is no modprobe command inside the kubelet container (image: gcr.io/google_containers/hyperkube-amd64:v1.2.1) so I manually ran apt-get update && apt-get install kmod to make that command appear inside the container, but without success).

My files look like this:

# secret/ceph-secret.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
data:
  key: QVFDeUpRVlhKVjRnRVJBQTFxN3k0V2k2TWl1TzhVYWhTUW9Jcmc9PQo=

# rbd-pod.yaml 
apiVersion: "v1"
kind: "Pod"
metadata: 
  name: "rbd2"
spec: 
  containers: 
    - name: "rbd-rw"
      image: "nginx"
      volumeMounts: 
        - mountPath: "/var/www/html"
          name: "rbdpd"
  volumes: 
    - name: "rbdpd"
      rbd: 
        monitors: 
          - "172.17.42.1:6789"
        pool: "rbd"
        image: "foo"
        user: "admin"
        secretRef: 
          name: "ceph-secret"
        fsType: "ext4"
        keyring: "/etc/ceph/ceph.client.admin.keyring"
        readOnly: true

I have checked that 172.17.42.1:6789 is reachable from the kubernetes cluster (because of using --net=host when booting the kubelet container).

How can I mount RBD volumes inside container as of kubernetes 1.2.1?

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 27 (14 by maintainers)

Commits related to this issue

Add workaround for Ceph in kubelet pod. https://github.com/kubernetes/kubernetes/issues/23924 — committed to localghost/rancher-catalog by deleted user 7 years ago
Merge pull request #23924 from jsafrane/fix-late-binding-msg Bug 1754840: Return proper error message when BindPodVolumes fails Origin-commit: dc84d06390f0862b3bf5c6a5874024838433d765 — committed to openshift/kubernetes by k8s-publishing-bot 5 years ago

Most upvoted comments

Also dealing with “Could not map image: Timeout after 10s”. Is there a solution?

marct83 on Aug 2, 2017

If i bind-mount /dev on the host to /dev on the kubelet container, the kubelet container will mess up with the pts on my host, resulting in being unable to start new terminals (gnome-terminal will fail with the “getpt failed” message) and making impossible to properly shutdown my workstations.

EDIT: after checking the issue tracker, the breakage resulting from bind-mounting /dev from host into the kubelet container is documented in #18230.

What I finally did was to wrap the hyperkube image like this:

# apply hacks from https://github.com/kubernetes/kubernetes/issues/23924#issuecomment-206803980
# so that pods that use rbd persistent resources work in the single-node docker setup.
# Build with the following command: `docker build -t custom/hyperkube-amd64:v1.2.1 .`

FROM gcr.io/google_containers/hyperkube-amd64:v1.2.1

RUN curl https://raw.githubusercontent.com/ceph/ceph/master/keys/release.asc | apt-key add - && \
    echo deb http://download.ceph.com/debian-hammer/ jessie main | tee /etc/apt/sources.list.d/ceph.list && \
    apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -q -y ceph-common && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

And then run the kubelet container like this:

    docker run \
      --volume=/:/rootfs:ro \
      --volume=/sys:/sys:rw \                                               # necessary to do mount from container
      --volume=/var/lib/docker/:/var/lib/docker:rw \
      --volume=/var/lib/kubelet/:/var/lib/kubelet:rw \
      --volume=/var/run:/var/run:rw \
      --volume=/sbin/modprobe:/sbin/modprobe:ro \          # to skip having to install in container
      --volume=/lib/modules:/lib/modules:ro \                     # to make `modprobe rbd` work
      --volume=/etc/ceph:/etc/ceph:ro \                             # to copy ceph config from host
      --volume=/dev/rbd0:/rootfs/dev/rbd0:ro \                  # workaround for point 3 above
      --net=host \
      --pid=host \
      --privileged=true \
      --name=kubelet \
      -d \
      custom/hyperkube-amd64:v${K8S_VERSION} \        # image with ceph-common vendored-in
      /hyperkube kubelet \
      --containerized \
      --hostname-override="127.0.0.1" \
      --address="0.0.0.0" \
      --api-servers=http://localhost:8080 \
      --config=/etc/kubernetes/manifests \
      --cluster-dns=10.0.0.10 \
      --cluster-domain=cluster.local \
      --allow-privileged=true --v=2

Then I can use rbd persistent volumes from my dockerized kubernetes setup.

jperville on Apr 7, 2016

modprobe rbd failure is the problem. What kubelet container image you are using? can you install modprobe in your image?

rootfs on Apr 7, 2016

@jperville have you tried apt-get installing the ceph-common package?

maclof on Apr 6, 2016