rook: 1.11.0 broken with hostnetwork due to new ports on pod

  • Bug Report

Deviation from expected behavior: With 1.11.0 every pod defines port 6800 and possibly 6801 – if you have host networking enabled then OSDs are started on the host network and only one pod can use any given port on the same network interface, thus only one OSD or MGR pod (the MGR pod already claims port 6800) can start on any given node.

Expected behavior: You should be able to run many OSDs on the same node without having kubernetes refuse to start them up due to port unavailability.

How to reproduce it (minimal and precise):

  • Install Rook with spec.network.provider: host on 1.11.0 with multiple OSDs on one host

Cluster Status to submit:

# ceph -s
  cluster:
    id:     cb82340a-2eaf-4597-b83e-cc0e62a9d019
    health: HEALTH_WARN
            no active mgr
            Degraded data redundancy: 19877/91515 objects degraded (21.720%), 69 pgs degraded
 
  services:
    mon: 3 daemons, quorum b,c,d (age 19h)
    mgr: no daemons active (since 18h)
    mds: 1/1 daemons up, 1 hot standby
    osd: 7 osds: 6 up (since 60m), 6 in (since 12h)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 193 pgs
    objects: 30.50k objects, 75 GiB
    usage:   190 GiB used, 5.1 TiB / 5.2 TiB avail
    pgs:     19877/91515 objects degraded (21.720%)
             3090/91515 objects misplaced (3.376%)
             66 active+undersized+degraded
             54 active+clean
             50 active+undersized
             18 stale+active+clean
             3  active+undersized+degraded+remapped+backfilling
             1  stale+active+remapped+backfilling
             1  active+remapped+backfilling

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 22.04.2 LTS
  • Kernel (e.g. uname -a): 5.15.0-60-generic #66-Ubuntu SMP
  • Cloud provider or hardware configuration: 6 bare medal nodes
  • Rook version (use rook version inside of a Rook Pod): rook: v1.11.0-14.gd70d8ad60
  • Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
  • Kubernetes version (use kubectl version): Server Version: v1.25.6+k3s1
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): k3s
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_WARN no active mgr; Degraded data redundancy: 19877/91515 objects degraded (21.720%), 69 pgs degraded

NOTE: Per #11792 I’m already using rook/ceph:v1.11.0-14.gd70d8ad60. Since that fix worked for the Op and his issue was closed, starting this new issue.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

The mgr-b issue #11791 is the only outstanding issue I still have. When 1.11.1 comes out I’ll test again. I’ll close this issue now.

Appreciate the support.

I’m pretty sure this would be related to @sp98 's recent changes to allow cluster multisite. I recall we set a flag to hard-code the OSD ports, but we must have missed the need to not set that flag in host network environments.

Yes, the issue in the title is already fixed by #11797 and is planned for release tomorrow in v1.11.1. Several different issues have been discussed here such as the mgr readiness probe (tracked by #11791). @reefland Shall we close this issue, or what is remaining after those two specific issues are fixed?

  1. I disable ArgoCD Auto Heal Feature which would roll-back any manual edits.
  2. I then edit the Deployments for OSD5 and OSD6 which are both on same node, removed ports.

This allowed both OSD’s to come on-line, and allowed a mgr to get started:

ceph -s
  cluster:
    id:     cb82340a-2eaf-4597-b83e-cc0e62a9d019
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum b,c,d (age 25h)
    mgr: a(active, since 3m)
    mds: 1/1 daemons up, 1 hot standby
    osd: 7 osds: 7 up (since 3m), 7 in (since 7m); 8 remapped pgs
    rgw: 1 daemon active (1 hosts, 1 zones)

Some endpoints showed up:

$ kubectl get endpoints -n rook-ceph
NAME                             ENDPOINTS             AGE
rook-ceph-exporter               <none>                26h
rook-ceph-mgr                    192.168.10.210:9283   110d
rook-ceph-mgr-dashboard          192.168.10.210:7000   110d
rook-ceph-rgw-ceph-objectstore   192.168.10.219:80     110d

The operator log had a stream of:

2023-03-03 22:59:55.724871 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state
2023-03-03 23:00:00.754818 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state
2023-03-03 23:00:05.770434 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state
2023-03-03 23:00:10.783927 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state
2023-03-03 23:00:15.799960 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state
2023-03-03 23:00:20.815388 I | op-k8sutil: waiting for all pods with label ceph_daemon_type=mgr,ceph_daemon_id=b to be in running state

Then updated Deployment to OSD4 to remove ports to allow backup mgr to be online:

# ceph -s
  cluster:
    id:     cb82340a-2eaf-4597-b83e-cc0e62a9d019
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum b,c,d (age 26h)
    mgr: a(active, since 15m), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 7 osds: 7 up (since 24s), 7 in (since 19m)
    rgw: 1 daemon active (1 hosts, 1 zones)

Which allowed the operator to move forward, not sure what is important to look for.

This no longer hangs:

ceph fs status

ceph-filesystem - 2 clients
===============
RANK      STATE              MDS            ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      ceph-filesystem-a  Reqs:    0 /s  11.9k   102     27      8   
0-s   standby-replay  ceph-filesystem-b  Evts:    0 /s  12.0k   102     27      0   
          POOL              TYPE     USED  AVAIL  
ceph-filesystem-metadata  metadata  1190M  1620G  
 ceph-filesystem-data0      data     519M  1620G  
MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

Ok I have a repro now, will fix the ports on the mgr pod as well so they are only added when multi-cluster network option is enabled, similar to #11797 for the OSDs.