rook: PGs are stuck in "inactive" state after cluster initialisation
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
After initialising a test single-node bare-metal cluster, checking health with ceph health detail reveals an unhealthy cluster state:
HEALTH_WARN Reduced data availability: 1 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pgs inactive
pg 1.0 is stuck inactive for 1h, current state unknown, last acting []
Further attempts to create Storage Class and PVC result in PVCs being stuck in Pending.
Expected behavior:
Healthy cluster with functioning PVCs.
How to reproduce it (minimal and precise):
# initialise Rook on fresh k8s cluster as per documentation:
kubectl create -f rook/cluster/examples/kubernetes/ceph/common.yaml
kubectl create -f rook/cluster/examples/kubernetes/ceph/operator.yaml
kubectl create -f rook/cluster/examples/kubernetes/ceph/cluster-test.yaml
# deploy toolbox:
kubectl apply -f rook/cluster/examples/kubernetes/ceph/toolbox.yaml
# login to a toolbox and query cluster health:
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
# notice: when I launched right after creation of a cluster, there was 1 inactive PG reported
# after leaving cluster for few hours, there are 33 of them
> ceph -s
cluster:
id: bd9c4d9d-7fcc-4771-82e5-aca2dd144575
health: HEALTH_WARN
Reduced data availability: 33 pgs inactive
services:
mon: 1 daemons, quorum a (age 5h)
mgr: a(active, since 5h)
osd: 1 osds: 1 up (since 5h), 1 in (since 5h)
data:
pools: 2 pools, 33 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
33 unknown
File(s) to submit:
Environment:
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
- Kernel (e.g.
uname -a):Linux kmaster 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Mon Sep 14 14:37:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux - Cloud provider or hardware configuration: Bare-metal single-node cluster based on Intel NUC 10 (please let me know if further details are required). Single hard drive, raw partition created for Ceph.
- Rook version (use
rook versioninside of a Rook Pod):
rook: v1.4.5
go: go1.13.8
- Storage backend version (e.g. for ceph do
ceph -v):ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) - Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:41:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Bare metal
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox): see output above
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 15 (5 by maintainers)
@Juriy Have you sorted this out? I’m a bit confused by the outputs:
I see you have a rule which looks correct to me, not sure what’s going on.