rook: Failed to create CephObjectStore, object store deployments, pools, crush rules

Is this a bug report or feature request?

  • Bug Report

Expected behavior:

We can create thousands of CephObjectStore.

How to reproduce it (minimal and precise):

  1. Deploy Rook operator with Helm chart v.1.9.9
  2. Deploy Ceph cluster with Helm chart v.1.9.9
  3. Create 50 CephObjectStore instances.
  4. Creation of 43+ instance fail.

Logs to submit:

  • Operator’s logs, if necessary

E | ceph-object-controller: failed to reconcile CephObjectStore "rook-ceph/ceph-objectstore-46". failed to create object store deployments: failed to create object pools: failed to create metadata pools: failed to create pool "ceph-objectstore-46.rgw.control": failed to create replicated crush rule "ceph-objectstore-46.rgw.control": failed to create crush rule ceph-objectstore-46.rgw.control: exit status 28

Experimentally, we managed to find out that CRUSH rule 257+ cannot be created. Pools and profiles in the amount of 257+ are created normally.

# ceph osd crush rule create-erasure test-10 test-10.data_ecprofile
Error ENOSPC: failed to add rule 256 because (28) No space left on device
ceph osd crush rule create-replicated test-10 default host
Error ENOSPC: failed to add rule 256 because (28) No space left on device

Cluster Status to submit:

$ ceph -s
  cluster:
    id:     33b3053d-5921-4720-8155-8e55879ed82f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 52m)
    mgr: b(active, since 43m)
    mds: 1/1 daemons up, 1 hot standby
    osd: 16 osds: 16 up (since 44m), 16 in (since 44m)
    rgw: 41 daemons active (4 hosts, 41 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   256 pools, 3121 pgs
    objects: 13.48k objects, 1.1 MiB
    usage:   3.4 GiB used, 1.6 TiB / 1.6 TiB avail
    pgs:     3121 active+clean
 
  io:
    client:   60 KiB/s rd, 3.1 KiB/s wr, 71 op/s rd, 32 op/s wr

Environment:

  • OS (e.g. from /etc/os-release): Oracle Linux Server 8.6
  • Kernel (e.g. uname -a): 5.4.17-2136.308.9.el8uek.x86_64
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): rook: v1.9.9 go: go1.17.12
  • Storage backend version (e.g. for ceph do ceph -v): ceph version 16.2.10
  • Kubernetes version (use kubectl version): v1.22.5
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Installed on virtal machines with Kubespray
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_WARN 1 pools have many more objects per pg than average

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 36 (15 by maintainers)

Most upvoted comments

Yes, this works great! 2 isolated object stores in one set of pools.

@cbodley, @travisn thank you!

Created Feature Request.

Ceph must have that limit because it’s really not designed to run with hundreds of pools and so many PGs. Are you creating so many object stores because you need multitenancy? If so, I’d suggest breaking it down into smaller rook clusters and have multiple rook clusters per K8s cluster.