kubernetes: How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd")
Hello kubernetes,
I am trying to follow the instructions from the rbd example. After successfully booting a ceph demo cluster (sudo ceph -s on the host displays HEALTH_OK) and manually creating a foo rdb volume formatted in ext4, I cannot start any pod that uses rdb volumes.
The rdb2 pod never starts, it stays in ContainerCreating state, as shown by kubectl get pod output below:
NAME READY STATUS RESTARTS AGE
k8s-etcd-127.0.0.1 1/1 Running 0 49m
k8s-master-127.0.0.1 4/4 Running 4 50m
k8s-proxy-127.0.0.1 1/1 Running 0 49m
rbd2 0/1 ContainerCreating 0 9m
I am using kubernetes 1.2.1 with docker 1.9.1 on ubuntu 14.04 amd64 host using the single-node docker cluster.
The output of kubectl describe pods rbd2 is the following:
Name: rbd2
Namespace: default
Node: 127.0.0.1/127.0.0.1
Start Time: Wed, 06 Apr 2016 18:38:22 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
rbd-rw:
Container ID:
Image: nginx
Image ID:
Port:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
rbdpd:
Type: RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
CephMonitors: [172.17.42.1:6789]
RBDImage: foo
FSType: ext4
RBDPool: rbd
RadosUser: admin
Keyring: /etc/ceph/ceph.client.admin.keyring
SecretRef: &{ceph-secret}
ReadOnly: true
default-token-1ze78:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1ze78
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
7m 7m 1 {default-scheduler } Normal Scheduled Successfully assigned rbd2 to 127.0.0.1
7m 7s 33 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1
7m 7s 33 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: rbd: failed to modprobe rbd error:exit status 1
In the kubelet docker log, I can see the following trace, repeated multiple times.
I0406 16:44:56.885150 8236 rbd.go:89] ceph secret info: key/AQCyJQVXJV4gERAA1q7y4Wi6MiuO8UahSQoIrg==
I0406 16:44:56.887715 8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
E0406 16:44:57.889282 8236 disk_manager.go:56] failed to attach disk
E0406 16:44:57.889295 8236 rbd.go:208] rbd: failed to setup
E0406 16:44:57.889334 8236 kubelet.go:1780] Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1; skipping pod
E0406 16:44:57.889340 8236 pod_workers.go:138] Error syncing pod fa59e744-fc15-11e5-8533-28d2444cbe8c, skipping: rbd: failed to modprobe rbd error:exit status 1
I0406 16:44:58.884709 8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
As I understand the above logs, the kubelet container is trying to run something like modprobe rbd inside itself (or somewhere else?) and that fails; I noticed that there is no modprobe command inside the kubelet container (image: gcr.io/google_containers/hyperkube-amd64:v1.2.1) so I manually ran apt-get update && apt-get install kmod to make that command appear inside the container, but without success).
My files look like this:
# secret/ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
data:
key: QVFDeUpRVlhKVjRnRVJBQTFxN3k0V2k2TWl1TzhVYWhTUW9Jcmc9PQo=
# rbd-pod.yaml
apiVersion: "v1"
kind: "Pod"
metadata:
name: "rbd2"
spec:
containers:
- name: "rbd-rw"
image: "nginx"
volumeMounts:
- mountPath: "/var/www/html"
name: "rbdpd"
volumes:
- name: "rbdpd"
rbd:
monitors:
- "172.17.42.1:6789"
pool: "rbd"
image: "foo"
user: "admin"
secretRef:
name: "ceph-secret"
fsType: "ext4"
keyring: "/etc/ceph/ceph.client.admin.keyring"
readOnly: true
I have checked that 172.17.42.1:6789 is reachable from the kubernetes cluster (because of using --net=host when booting the kubelet container).
How can I mount RBD volumes inside container as of kubernetes 1.2.1?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 27 (14 by maintainers)
Commits related to this issue
- Add workaround for Ceph in kubelet pod. https://github.com/kubernetes/kubernetes/issues/23924 — committed to localghost/rancher-catalog by deleted user 7 years ago
- Merge pull request #23924 from jsafrane/fix-late-binding-msg Bug 1754840: Return proper error message when BindPodVolumes fails Origin-commit: dc84d06390f0862b3bf5c6a5874024838433d765 — committed to openshift/kubernetes by k8s-publishing-bot 5 years ago
Also dealing with “Could not map image: Timeout after 10s”. Is there a solution?
If i bind-mount
/devon the host to/devon the kubelet container, the kubelet container will mess up with the pts on my host, resulting in being unable to start new terminals (gnome-terminal will fail with the “getpt failed” message) and making impossible to properly shutdown my workstations.EDIT: after checking the issue tracker, the breakage resulting from bind-mounting /dev from host into the kubelet container is documented in #18230.
What I finally did was to wrap the hyperkube image like this:
And then run the kubelet container like this:
Then I can use rbd persistent volumes from my dockerized kubernetes setup.
modprobe rbdfailure is the problem. What kubelet container image you are using? can you installmodprobein your image?@jperville have you tried apt-get installing the ceph-common package?