rook: radosgw-admin in rook operator 1.12.4 image fails to run commands against Ceph reef

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

  • Rook operator fails to reconcile OBC after upgrading the cluster to ceph reef. For a new OBC no OB is created.
  • radosgw-admin when run in a toolbox deployed using the rook operator 1.12.4 image hangs indefinitely for invocations that execute normally in a toolbox utilizing the ceph reef image.

Expected behavior:

  • Rook operator is capable of reconciling OBC and creating an OB for a newly created OBC.
  • radosgw-admin when run in a toolbox deployed using the rook operator 1.12.4 image behaves similar to when run from a toolbox utilizing the ceph reef image.

How to reproduce it (minimal and precise):

  1. Deploy rook v1.12.4
  2. Create a cluster running ceph quincy (v17.2.6) with at least one rgw
  3. Upgrade cluster to reef (v18.2.0) - issue is probably reproducible when deploying reef directly, but I haven’t tested that.
  4. Apply toolbox.yaml and toolbox-operator-image.yaml
  5. Run radosgw-admin user list --cluster="rook-ceph" --conf="/etc/ceph/ceph.conf" --keyring="/etc/ceph/keyring" or radosgw-admin user list in both toolbox pods and compare.

Cluster Status to submit:

  cluster:
    id:     ...
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum h,j,k (age 5h)
    mgr: b(active, since 4h), standbys: a
    mds: 2/2 daemons up, 2 hot standby
    osd: 6 osds: 6 up (since 4d), 6 in (since 3M)
    rgw: 4 daemons active (2 hosts, 4 zones)

  data:
    volumes: 2/2 healthy
    pools:   36 pools, 617 pgs
    objects: 3.12M objects, 9.6 TiB
    usage:   29 TiB used, 29 TiB / 58 TiB avail
    pgs:     615 active+clean
             2   active+clean+scrubbing+deep

Environment:

  • OS (e.g. from /etc/os-release):
    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Debian
    Description:    Debian GNU/Linux 12 (bookworm)
    Release:        12
    Codename:       bookworm
    
  • Kernel (e.g. uname -a):
    $ uname -a
    Linux hostname 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07) x86_64 GNU/Linux
    
  • Cloud provider or hardware configuration: Not relevant.
  • Rook version (use rook version inside of a Rook Pod):
    $ rook version
    rook: v1.12.4
    go: go1.21.1
    
  • Storage backend version (e.g. for ceph do ceph -v): in operator pod:
    $ ceph -v
    ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
    
    in toolbox pod and most others:
    $ ceph -v
    ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)
    
  • Kubernetes version (use kubectl version):
    $ kubectl version
    Client Version: v1.28.2
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.27.5
    
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Kubernetes on Baremetal deployed using kubespray
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 25 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Reopening since #13050 was just for the toolbox, but the issue is with the operator

Go-ceph is safe to use between higher and lower versions

AFAIR RGW has issues with different versions of radosgw-admin, especially when the lower version of radosgw-admin executes against a newer version of the RGW cluster. This can be applicable to certain commands, but not all. We had this issue earlier hence Rook internally uses rgw adminops via go-ceph library for operations.