rook: Nodes defined in cluster are not getting init by operator

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior: I have 14 nodes defined in rook cluster:

(⎈ |nautilus:rook)Dmitrys-MacBook-Pro-2:jupyterhub dimm$ kubectl get cluster rook -o yaml | grep "^      name:"
      name: k8s-gpu-03.sdsc.optiputer.net
      name: fiona.nwsc.ucar.edu
      name: k8s-epyc-01.sdsc.optiputer.net
      name: k8s-nvme-01.sdsc.optiputer.net
      name: fiona.tools.ucla.net
      name: fiona-dtn-1.ucsc.edu
      name: fiona-dtn.usc.edu
      name: k8s-nvme-01.ultralight.org
      name: netw-fiona.stanford.edu
      name: fiona.its.hawaii.edu
      name: dtn-main.ucr.edu
      name: ps-100g.sdsu.edu
      name: siderea.ucsc.edu
      name: knuron.calit2.optiputer.net

Note knuron.calit2.optiputer.net and k8s-gpu-03.sdsc.optiputer.net in the list. Now listing the crush map:

[root@osg /]# ceph osd tree | grep host
-53       150.79706     host dtn-main-ucr-edu
-25         2.77475     host fiona-dtn-1-ucsc-edu
-37         2.77475     host fiona-dtn-usc-edu
-49         2.77475     host fiona-its-hawaii-edu
 -7         2.77475     host fiona-nwsc-ucar-edu
-21       174.30396     host fiona-tools-ucla-net
-10        75.06427     host k8s-epyc-01-sdsc-optiputer-net
-17       131.46848     host k8s-nvme-01-sdsc-optiputer-net
-41         6.92201     host k8s-nvme-01-ultralight-org
-45       169.26126     host netw-fiona-stanford-edu
-29         5.66150     host ps-100g-sdsu-edu
-33         5.66150     host siderea-ucsc-edu

These 2 hosts are not defined.

No configmap for those 2 hosts in rook, some devices configmaps for these in rook-system. Nothing about those 2 hosts in rook-operator.

Expected behavior: On operator start it checks that all nodes in cluster CRD are created and have OSDs initialized

How to reproduce it (minimal and precise): Can’t reproduce. The operator purposely deleted these 2 nodes when it was broken after the upgrade.

Environment:

  • Cloud provider or hardware configuration: baremetal
  • Rook version (use rook version inside of a Rook Pod): 0.8.1
  • Kubernetes version (use kubectl version): 1.11.2
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): healthy

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 18 (16 by maintainers)

Most upvoted comments

Added OSD tolerations to cluster, now all nodes are in. Thanks!

I did restart multiple times, no effect