rook: Init containers do not ensure ceph OSD has registered credentials in ceph

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Something caused my cluster to lose OSD credentials for 36 of my 91 OSDs. The operator pod may have been deleted by someone running an improperly scoped command (edit: or, more likely, the node/rack it was in lost connectivity and it was rescheduled). When it came back online it fired off all of the workers to ensure the OSDs were all configured. I can only assume something in this phase cleared the credentials for the OSDs but may have broken at some point and not added credentials back in (I was running from master at the time). The OSD pods do not ensure their credentials are registered with ceph upon startup.

Expected behavior:

The init containers should ensure the OSD’s credentials are registered with ceph.

How to reproduce it (minimal and precise):

Delete the credentials for the OSD in ceph. Restart the OSD. (Yes, it sounds crazy but something in master at the time did cause this)

Environment:

OS (e.g. from /etc/os-release): Ubuntu 16.04 LTS
Kernel (e.g. uname -a): 4.4.0-142-generic
Cloud provider or hardware configuration: Specific details upon request.
Rook version (use rook version inside of a Rook Pod): v0.9.0-238.g4f97270
Kubernetes version (use kubectl version): 1.13.3
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): baremetal deployed via kubespray
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): Now? HEALTH_OK

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 21 (14 by maintainers)

Most upvoted comments

@travisn @leseb This seems like a pretty significant issue that shouldn’t be marked wontfix.

dkowsley on Sep 5, 2019