rook: rook-ceph-crash-collector-keyring secret not created for crash reporter
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
Expected behavior: On a fresh Kubernetes v1.17 cluster I try to setup an example cluster
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
My only changes are rook/ceph:v1.2 tag for the operator and the following diff:
diff --git a/cluster/examples/kubernetes/ceph/cluster.yaml b/cluster/examples/kubernetes/ceph/cluster.yaml
index 460277c..6d04daf 100644
--- a/cluster/examples/kubernetes/ceph/cluster.yaml
+++ b/cluster/examples/kubernetes/ceph/cluster.yaml
@@ -39,7 +39,7 @@ spec:
# set the amount of mons to be started
mon:
count: 3
- allowMultiplePerNode: false
+ allowMultiplePerNode: true
# mgr:
# modules:
# Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
@@ -126,8 +126,8 @@ spec:
# osd: rook-ceph-osd-priority-class
# mgr: rook-ceph-mgr-priority-class
storage: # cluster level storage configuration and selection
- useAllNodes: true
- useAllDevices: true
+ useAllNodes: false
+ useAllDevices: false
#deviceFilter:
config:
# The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
@@ -143,6 +143,12 @@ spec:
#- path: /var/lib/rook
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
+ nodes:
+ - name: "master1"
+ devices:
+ - name: "sdb"
+ - name: "sdc"
+ - name: "sdd"
# nodes:
# - name: "172.17.4.101"
# directories: # specific directories to use for storage can be specified for each node
With version 1.2 the crashcollector was introduced. And this PR changed the collector to use its own secret.
The issue is the collector pod can’t mount the ceph secret, because it doesn’t exist.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m27s default-scheduler Successfully assigned rook-ceph/rook-ceph-crashcollector-master1.wasc.io-7f94c7bcc7-42lg8 to master1.wasc.io
Warning FailedMount 4m24s kubelet, master1.wasc.io Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[default-token-78jgn rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash]: timed out waiting for the condition
Warning FailedMount 2m8s kubelet, master1.wasc.io Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-ceph-crash default-token-78jgn rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log]: timed out waiting for the condition
Warning FailedMount 15s (x11 over 6m26s) kubelet, master1.wasc.io MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
And in fact, it’s not here:
NAME TYPE DATA AGE
default-token-78jgn kubernetes.io/service-account-token 3 7m47s
rook-ceph-admin-keyring kubernetes.io/rook 1 7m10s
rook-ceph-cmd-reporter-token-2rflk kubernetes.io/service-account-token 3 7m45s
rook-ceph-config kubernetes.io/rook 2 7m13s
rook-ceph-mgr-token-9d9vw kubernetes.io/service-account-token 3 7m45s
rook-ceph-mon kubernetes.io/rook 4 7m13s
rook-ceph-mons-keyring kubernetes.io/rook 1 7m11s
rook-ceph-osd-token-z2bc6 kubernetes.io/service-account-token 3 7m45s
rook-ceph-system-token-xtdcz kubernetes.io/service-account-token 3 7m45s
rook-csi-cephfs-plugin-sa-token-vhkrf kubernetes.io/service-account-token 3 7m43s
rook-csi-cephfs-provisioner-sa-token-52q5l kubernetes.io/service-account-token 3 7m43s
rook-csi-rbd-plugin-sa-token-mkq9z kubernetes.io/service-account-token 3 7m42s
rook-csi-rbd-provisioner-sa-token-mztf8 kubernetes.io/service-account-token 3 7m42s
How to reproduce it (minimal and precise):
File(s) to submit:
Environment:
- OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
- Kernel (e.g.
uname -a):
Linux master1 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue Nov 26 16:34:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Cloud provider or hardware configuration: Bare metal
- Rook version (use
rook versioninside of a Rook Pod):
rook: v1.2.0
go: go1.11
- Storage backend version (e.g. for ceph do
ceph -v):
ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"$Format:%H$", GitTreeState:"", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): baremetal with kubeadm
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox): The cluster is not created yet
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 30 (9 by maintainers)
Commits related to this issue
- ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to leseb/rook by leseb 4 years ago
- ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to rook/rook by leseb 4 years ago
- ceph: fix external cluster crash k8s secret The external cluster now creates the crash key when appropriate. Closes: https://github.com/rook/rook/issues/4553 Signed-off-by: Sébastien Han <seb@redhat... — committed to binoue/rook by leseb 4 years ago
@JieZeng1993
kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyringthis method works ok! the tree related path: (1)./var/lib/rook/ (2)./var/lib/kubelet/plugins/ (3)./var/lib/kubelet/plugins_registry/ after u clean the three directories files, remember to reinstall the rook service, otherwise it doesn’t work.
As usual, I forgot to zap my disks and the
/var/lib/rookdirectory before recreating the cluster! I had encryption enabled before, and the mons freaked out about it of course! Thanks to everyone and their help!I think I figured it out. Writing this down here for others to find. I had done several tests installing rook ceph in Kubernetes. Cleaning a system for a new run takes more than just removing the /var/lib/rook directory. I also found some sockets in the /var/lib/kubelet directory related to rook ceph On my system it worked after removing these too.
@alter this is not always the case, sometimes it is just a symptom.
Common use case is when you deploy the cluster and for some reason you remove it without properly purging the servers (dataDirHostPath folder).
I had a mix problem when i used your solution to solve the secret issue, but this solution also “blocked” the mgr pod from even getting created (something with “…field is immutable.”)
When i removed the secret (and restarted the operator pod), everything started to work as expected.
The initial problem was purging the dataDirHostPath folder on each server.
Hope it helps anybody 😃
I had to use this:
kubectl -n rook-ceph create secret generic --type kubernetes.io/rook rook-ceph-crash-collector-keyring