rook: ceph: failed to start OSD in expand-bluefs init container

Is this a bug report or feature request?

Bug Report

Restarting OSD pod after crashing ceph-osd daemon failed in expand-bluefs init container.

Deviation from expected behavior:

expand-bluefs init container failed. Here is the top of this container’s log (full log is in "File(s) to submit* section).

inferring bluefs devices from bluestore path
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs _check_new_allocations invalid extent 1: 0xa04d0000~10000: duplicate reference, ino 68
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs mount failed to replay log: (14) Bad address
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_bluefs failed bluefs mount: (14) Bad address
...

I guess that calling ceph-bluestore-tool bluefs-bdev-expand corrupts OSD if it’s in an inconsistent state.

Expected behavior:

Succeeded to start OSD pod.

How to reproduce it (minimal and precise):

Create a healthy OSD pod. Here I assume OSD 0.
Run the following bash script.

for ((;;)) ; do for ((;;)) ; do sudo kill -KILL $(ps ax | grep ceph-osd | awk '/foreground/{print $1}') 2>/dev/null && break ; done ; sleep 1 ; kubectl -n rook-ceph delete pod -lceph-osd-id="0" ; done

Restarting OSD pod will fail in expand-bluefs init container in several minutes.

Here are my case after step 2.

$ kubectl -n rook-ceph get pod -w
...
rook-ceph-osd-0-58c8d8ccf8-49zcz                0/1     Terminating       1          34s
rook-ceph-osd-0-58c8d8ccf8-49zcz                0/1     Terminating       1          35s
rook-ceph-osd-0-58c8d8ccf8-49zcz                0/1     Terminating       1          35s
rook-ceph-osd-0-58c8d8ccf8-s8sh9                0/1     Error             0          30s
rook-ceph-osd-0-58c8d8ccf8-s8sh9                0/1     Terminating       0          30s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Pending           0          0s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Pending           0          0s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:0/4          0          0s
rook-ceph-osd-0-58c8d8ccf8-s8sh9                0/1     Terminating       0          32s
rook-ceph-osd-0-58c8d8ccf8-s8sh9                0/1     Terminating       0          32s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:1/4          0          2s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:2/4          0          3s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:Error        0          4s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:Error        1          5s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:CrashLoopBackOff   1          6s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:2/4                2          18s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:Error              2          19s
rook-ceph-osd-0-58c8d8ccf8-96wtd                0/1     Init:CrashLoopBackOff   2          31s

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  dataDirHostPath: /var/lib/rook
  mon:
    count: 1
    allowMultiplePerNode: false
  cephVersion:
    image: ceph/ceph:v15.2.4
    allowUnsupported: false
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  dashboard:
    enabled: false
    ssl: true
  network:
    hostNetwork: false
  crashCollector:
    disable: true
  storage:
    storageClassDeviceSets:
    - name: set1
      count: 1
      portable: false
      tuneSlowDeviceClass: true
      placement:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd-prepare
              topologyKey: kubernetes.io/hostname
      resources:
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 5Gi
          storageClassName: manual
          volumeMode: Block
          accessModes:
            - ReadWriteOnce

Crashing pod(s) logs, if necessary

$ kubectl -n rook-ceph logs rook-ceph-osd-0-58c8d8ccf8-96wtd -c expand-bluefs
inferring bluefs devices from bluestore path
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs _check_new_allocations invalid extent 1: 0xa04d0000~10000: duplicate reference, ino 68
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs mount failed to replay log: (14) Bad address
2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_bluefs failed bluefs mount: (14) Bad address
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7ff95fc5e240 time 2020-11-02T10:51:58.079649+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: 6837: FAILED ceph_assert(r == 0)
 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff955d7eafc]
 2: (()+0x276d16) [0x7ff955d7ed16]
 3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 4: (main()+0x2c81) [0x560c04461791]
 5: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 6: (_start()+0x2e) [0x560c04481ffe]
*** Caught signal (Aborted) **
 in thread 7ff95fc5e240 thread_name:ceph-bluestore-
2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7ff95fc5e240 time 2020-11-02T10:51:58.079649+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: 6837: FAILED ceph_assert(r == 0)

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff955d7eafc]
 2: (()+0x276d16) [0x7ff955d7ed16]
 3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 4: (main()+0x2c81) [0x560c04461791]
 5: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 6: (_start()+0x2e) [0x560c04481ffe]

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (()+0x12dd0) [0x7ff954f80dd0]
 2: (gsignal()+0x10f) [0x7ff9535c570f]
 3: (abort()+0x127) [0x7ff9535afb25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff955d7eb4d]
 5: (()+0x276d16) [0x7ff955d7ed16]
 6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 7: (main()+0x2c81) [0x560c04461791]
 8: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 9: (_start()+0x2e) [0x560c04481ffe]
2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 *** Caught signal (Aborted) **
 in thread 7ff95fc5e240 thread_name:ceph-bluestore-

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (()+0x12dd0) [0x7ff954f80dd0]
 2: (gsignal()+0x10f) [0x7ff9535c570f]
 3: (abort()+0x127) [0x7ff9535afb25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff955d7eb4d]
 5: (()+0x276d16) [0x7ff955d7ed16]
 6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 7: (main()+0x2c81) [0x560c04461791]
 8: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 9: (_start()+0x2e) [0x560c04481ffe]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

   -47> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs _check_new_allocations invalid extent 1: 0xa04d0000~10000: duplicate reference, ino 68
   -46> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs mount failed to replay log: (14) Bad address
   -45> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_bluefs failed bluefs mount: (14) Bad address
   -44> 2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7ff95fc5e240 time 2020-11-02T10:51:58.079649+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: 6837: FAILED ceph_assert(r == 0)

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff955d7eafc]
 2: (()+0x276d16) [0x7ff955d7ed16]
 3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 4: (main()+0x2c81) [0x560c04461791]
 5: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 6: (_start()+0x2e) [0x560c04481ffe]

   -43> 2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 *** Caught signal (Aborted) **
 in thread 7ff95fc5e240 thread_name:ceph-bluestore-

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (()+0x12dd0) [0x7ff954f80dd0]
 2: (gsignal()+0x10f) [0x7ff9535c570f]
 3: (abort()+0x127) [0x7ff9535afb25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff955d7eb4d]
 5: (()+0x276d16) [0x7ff955d7ed16]
 6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 7: (main()+0x2c81) [0x560c04461791]
 8: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 9: (_start()+0x2e) [0x560c04481ffe]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

    -8> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs _check_new_allocations invalid extent 1: 0xa04d0000~10000: duplicate reference, ino 68
    -7> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluefs mount failed to replay log: (14) Bad address
    -3> 2020-11-02T10:51:57.777+0000 7ff95fc5e240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_bluefs failed bluefs mount: (14) Bad address
    -1> 2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7ff95fc5e240 time 2020-11-02T10:51:58.079649+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.4/rpm/el8/BUILD/ceph-15.2.4/src/os/bluestore/BlueStore.cc: 6837: FAILED ceph_assert(r == 0)

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff955d7eafc]
 2: (()+0x276d16) [0x7ff955d7ed16]
 3: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 4: (main()+0x2c81) [0x560c04461791]
 5: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 6: (_start()+0x2e) [0x560c04481ffe]

     0> 2020-11-02T10:51:58.077+0000 7ff95fc5e240 -1 *** Caught signal (Aborted) **
 in thread 7ff95fc5e240 thread_name:ceph-bluestore-

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
 1: (()+0x12dd0) [0x7ff954f80dd0]
 2: (gsignal()+0x10f) [0x7ff9535c570f]
 3: (abort()+0x127) [0x7ff9535afb25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff955d7eb4d]
 5: (()+0x276d16) [0x7ff955d7ed16]
 6: (BlueStore::expand_devices(std::ostream&)+0xa71) [0x560c04563311]
 7: (main()+0x2c81) [0x560c04461791]
 8: (__libc_start_main()+0xf3) [0x7ff9535b16a3]
 9: (_start()+0x2e) [0x560c04481ffe]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

reraise_fatal: default handler for signal 6 didn't terminate the process?

Environment:

OS (e.g. from /etc/os-release): Ubuntu 18.04
Kernel (e.g. uname -a): 4.15.0-88-generic
Cloud provider or hardware configuration: Hyper-V VM
Rook version (use rook version inside of a Rook Pod): master branch (commit 44bf443dca7bce155157956950b23faa1152b5e3)
Storage backend version (e.g. for ceph do ceph -v):ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.6”, GitCommit:“dff82dc0de47299ab66c83c626e08b245ab19037”, GitTreeState:“clean”, BuildDate:“2020-07-15T16:58:53Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.6”, GitCommit:“dff82dc0de47299ab66c83c626e08b245ab19037”, GitTreeState:“clean”, BuildDate:“2020-07-15T16:51:04Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”}

Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering; OSD count 1 < osd_pool_default_size 3

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 29 (26 by maintainers)

Commits related to this issue

ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to cybozu-go/rook by satoru-takeuchi 4 years ago
ceph: fix osd corruption Multiple OSD pods for the same OSD might run simultaneously becasue OSD pod is managed by Deployment resource. However, OSD locking mechanism doens't work because the lock fi... — committed to rook/rook by satoru-takeuchi 4 years ago

Most upvoted comments

@travisn @leseb

I’d like to create a fix that change the volume type of /var/lib/ceph/osd/ceph-X from emptydir to hostPath. Please let me know if you have any opinions and concerns.

Yes, using a hostPath for the osd path makes sense to enable the lock files to be shared. I can’t think of a better way. A couple additional thoughts:

The osd prepare pod doesn’t need access to this host path, right?
Portable OSDs on PVCs (with storageClassDeviceSets) seem like they should already have protection based on the PV, but wouldn’t hurt to have the host path in that scenario in case the OSD is re-created on the same node.

travisn on Nov 30, 2020

The corresponding Ceph’s issue.

https://tracker.ceph.com/issues/48036

satoru-takeuchi on Nov 2, 2020