rook: rook-ceph-crash-collector-keyring secret not created for crash reporter

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Expected behavior: On a fresh Kubernetes v1.17 cluster I try to setup an example cluster

kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml

My only changes are rook/ceph:v1.2 tag for the operator and the following diff:

diff --git a/cluster/examples/kubernetes/ceph/cluster.yaml b/cluster/examples/kubernetes/ceph/cluster.yaml
index 460277c..6d04daf 100644
--- a/cluster/examples/kubernetes/ceph/cluster.yaml
+++ b/cluster/examples/kubernetes/ceph/cluster.yaml
@@ -39,7 +39,7 @@ spec:
   # set the amount of mons to be started
   mon:
     count: 3
-    allowMultiplePerNode: false
+    allowMultiplePerNode: true
   # mgr:
     # modules:
     # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
@@ -126,8 +126,8 @@ spec:
 #    osd: rook-ceph-osd-priority-class
 #    mgr: rook-ceph-mgr-priority-class
   storage: # cluster level storage configuration and selection
-    useAllNodes: true
-    useAllDevices: true
+    useAllNodes: false
+    useAllDevices: false
     #deviceFilter:
     config:
       # The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
@@ -143,6 +143,12 @@ spec:
     #- path: /var/lib/rook
 # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
 # nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
+    nodes:
+      - name: "master1"
+        devices:
+          - name: "sdb"
+          - name: "sdc"
+          - name: "sdd"
 #    nodes:
 #    - name: "172.17.4.101"
 #      directories: # specific directories to use for storage can be specified for each node

With version 1.2 the crashcollector was introduced. And this PR changed the collector to use its own secret.

The issue is the collector pod can’t mount the ceph secret, because it doesn’t exist.

Events:
  Type     Reason       Age                   From                      Message
  ----     ------       ----                  ----                      -------
  Normal   Scheduled    6m27s                 default-scheduler         Successfully assigned rook-ceph/rook-ceph-crashcollector-master1.wasc.io-7f94c7bcc7-42lg8 to master1.wasc.io
  Warning  FailedMount  4m24s                 kubelet, master1.wasc.io  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[default-token-78jgn rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash]: timed out waiting for the condition
  Warning  FailedMount  2m8s                  kubelet, master1.wasc.io  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-ceph-crash default-token-78jgn rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log]: timed out waiting for the condition
  Warning  FailedMount  15s (x11 over 6m26s)  kubelet, master1.wasc.io  MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found

And in fact, it’s not here:

NAME                                         TYPE                                  DATA   AGE
default-token-78jgn                          kubernetes.io/service-account-token   3      7m47s
rook-ceph-admin-keyring                      kubernetes.io/rook                    1      7m10s
rook-ceph-cmd-reporter-token-2rflk           kubernetes.io/service-account-token   3      7m45s
rook-ceph-config                             kubernetes.io/rook                    2      7m13s
rook-ceph-mgr-token-9d9vw                    kubernetes.io/service-account-token   3      7m45s
rook-ceph-mon                                kubernetes.io/rook                    4      7m13s
rook-ceph-mons-keyring                       kubernetes.io/rook                    1      7m11s
rook-ceph-osd-token-z2bc6                    kubernetes.io/service-account-token   3      7m45s
rook-ceph-system-token-xtdcz                 kubernetes.io/service-account-token   3      7m45s
rook-csi-cephfs-plugin-sa-token-vhkrf        kubernetes.io/service-account-token   3      7m43s
rook-csi-cephfs-provisioner-sa-token-52q5l   kubernetes.io/service-account-token   3      7m43s
rook-csi-rbd-plugin-sa-token-mkq9z           kubernetes.io/service-account-token   3      7m42s
rook-csi-rbd-provisioner-sa-token-mztf8      kubernetes.io/service-account-token   3      7m42s

How to reproduce it (minimal and precise):

File(s) to submit:

Environment:

OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kernel (e.g. uname -a):

Linux master1 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue Nov 26 16:34:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Cloud provider or hardware configuration: Bare metal
Rook version (use rook version inside of a Rook Pod):

rook: v1.2.0
go: go1.11

Storage backend version (e.g. for ceph do ceph -v):

ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"$Format:%H$", GitTreeState:"", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): baremetal with kubeadm
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): The cluster is not created yet

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 30 (9 by maintainers)

Commits related to this issue

ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to leseb/rook by leseb 4 years ago
ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to rook/rook by leseb 4 years ago
ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to binoue/rook by leseb 4 years ago

Most upvoted comments

@JieZeng1993 kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring

+43

alter on Mar 18, 2020

I think I figured it out. Writing this down here for others to find. I had done several tests installing rook ceph in Kubernetes. Cleaning a system for a new run takes more than just removing the /var/lib/rook directory. I also found some sockets in the /var/lib/kubelet directory related to rook ceph On my system it worked after removing these too.

this method works ok! the tree related path: (1)./var/lib/rook/ (2)./var/lib/kubelet/plugins/ (3)./var/lib/kubelet/plugins_registry/ after u clean the three directories files, remember to reinstall the rook service, otherwise it doesn’t work.

kevinsingapore on Feb 15, 2020

As usual, I forgot to zap my disks and the /var/lib/rook directory before recreating the cluster! I had encryption enabled before, and the mons freaked out about it of course! Thanks to everyone and their help!

NicolaiSchmid on Dec 23, 2019

I think I figured it out. Writing this down here for others to find. I had done several tests installing rook ceph in Kubernetes. Cleaning a system for a new run takes more than just removing the /var/lib/rook directory. I also found some sockets in the /var/lib/kubelet directory related to rook ceph On my system it worked after removing these too.

nielsbasjes on Dec 31, 2019

@alter this is not always the case, sometimes it is just a symptom.

Common use case is when you deploy the cluster and for some reason you remove it without properly purging the servers (dataDirHostPath folder).

I had a mix problem when i used your solution to solve the secret issue, but this solution also “blocked” the mgr pod from even getting created (something with “…field is immutable.”)

When i removed the secret (and restarted the operator pod), everything started to work as expected.

The initial problem was purging the dataDirHostPath folder on each server.

Hope it helps anybody 😃

avishaiNB on Jun 17, 2020

@JieZeng1993 kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring

I had to use this:

kubectl -n rook-ceph create secret generic --type kubernetes.io/rook rook-ceph-crash-collector-keyring

aiqs4 on Jan 21, 2021