rook: sometimes csi-cephfsplugin, csi-cephfsplugin-provisioner,csi-rbdplugin and csi-rbdplugin-provisioner are missing

Is this a bug report or feature request?

  • Bug Report I started using rook 1.4.3 with ceph 15.2.4 some times when I deploy a cluster I can see that the above component are missing. looking at operator logs results with: ceph-csi: invalid csi version. failed to run CmdReporter rook-ceph-csi-detect-version successfully. failed waiting for results ConfigMap rook-ceph-csi-detect-version. failed to start watcher for the results ConfigMap. failed to list the current ConfigMaps in order to start ConfigMap watcher. etcdserver: request timed out failed to complete ceph CSI version job github.com/rook/rook/pkg/operator/ceph/csi.validateCSIVersion /home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/csi/spec.go:656 github.com/rook/rook/pkg/operator/ceph/csi.ValidateAndConfigureDrivers /home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/csi/csi.go:33 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1357

reset to my nodes solves the issue

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

I don’t know

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator’s logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read Github documentation if you need help.

Environment:

  • OS (e.g. from /etc/os-release): NAME="Ubuntu" VERSION="18.04.5 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.5 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a): Linux hezi-3nds-170920202100-vm1 5.4.0-1025-azure #25~18.04.1-Ubuntu SMP Sat Sep 5 15:28:57 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod):
  • Storage backend version (e.g. for ceph do ceph -v):
  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:51Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.9", GitCommit:"a17149e1a189050796ced469dbd78d380f2ed5ef", GitTreeState:"clean", BuildDate:"2020-04-16T11:36:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubespray 2.12
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

Looks like this issue should be totally resolved by https://github.com/rook/rook/pull/8607 in Rook v1.8+.

So in logs i see next:

2022-02-11 01:22:50.016216 E | ceph-csi: invalid csi version. failed to run CmdReporter rook-ceph-csi-detect-version successfully. failed to run job. Internal error occurred: unable to perform UCP DCT ima
ge resolve request: Post "https://yhhcpz4fscde4wtx4hltny3jw:4443/api/dct/resolveimage": read tcp 10.99.32.9:57698->10.100.91.2:4443: read: connection reset by peer
failed to complete ceph CSI version job
github.com/rook/rook/pkg/operator/ceph/csi.validateCSIVersion
        /home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/csi/spec.go:736
github.com/rook/rook/pkg/operator/ceph/csi.ValidateAndConfigureDrivers
        /home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/csi/csi.go:49
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1371

And there are no other tries to re-run this csi detection.