rook: manually repair OSD after rook cluster fails after k8s node restart

I was faced with the problem of falling OSD after restarting any node k8s. It is alleged that the OSD needs to stand up for himself, But it’s not working for me. The problem is described in more detail here: https://github.com/rook/rook/issues/1278

And here I want to discuss possible ways to get OSD to work manually.

ceph osd tree
ID  CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
 -1       0.46196 root default
 -2       0.09239     host 10-1-29-31
  1   hdd 0.09239         osd.1           up  1.00000 1.00000
-11       0.09239     host 10-1-29-32
  4   hdd 0.09239         osd.4           up  1.00000 1.00000
 -3       0.09239     host 10-1-29-33
  0   hdd 0.09239         osd.0           up  1.00000 1.00000
 -9       0.09239     host 10-1-29-34
  2   hdd 0.09239         osd.2         down        0 1.00000
 -4       0.09239     host 10-1-29-35
  3   hdd 0.09239         osd.3           up  1.00000 1.00000

I have tried the following methods found in the documentation:

ceph osd repair osd.2
Error EAGAIN: osd.2 is not up

ceph osd up osd.2
no valid command found; 10 closest matches:
osd count-metadata <property>
osd versions
osd find <osdname (id|osd.id)>
osd metadata {<osdname (id|osd.id)>}
osd getmaxosd
osd ls-tree {<int[0-]>} {<name>}
osd getmap {<int[0-]>}
osd getcrushmap {<int[0-]>}
osd tree {<int[0-]>} {up|down|in|out|destroyed [up|down|in|out|destroyed...]}
osd ls {<int[0-]>}
Error EINVAL: invalid command

ceph-volume simple activate osd.2
\-->  RuntimeError: Expected JSON config path not found: /etc/ceph/osd/osd.2-None.json

all of them lead to errors and do not give the expected result. maybe I’m moving in the wrong direction?

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 22 (16 by maintainers)

Most upvoted comments

We think we may have found a manual recovery method.

Spin up fresh 3 osd cluster
Down a k8s node long enough for related OSD to autoout
Bring node back up and wait for it to reschedule all pods
Ceph osd rm the offending OSD
Kill the pod which was running the offending OSD

The pod will be rescheduled and almost immediately the OSD will be UP and IN. Cluster remains fully RW to radosgw traffic during this entire process. This has only been tested with Replicated data pools so far, testing EC pools next.

This is less than ideal, as it requires manual intervention each time an OSD auto-outs. We would definitely like to work with the team to share any information we’ve found to help fix this.

zieg8301 on Jul 16, 2018

because manual OSD recovery is not possible, this issue can be closed.

beatlejuse on Feb 15, 2018