noobaa-core: Backend SC caused DB pod restarts. It never came to running state
Environment info
- NooBaa Version: master-20210627
- Platform: OCP 4.6.16
Actual behavior
- Upgrade to master-20210627 caused db pod crash
Expected behavior
1.DB pod shouldn’t crash
Steps to reproduce
- Old code - master-20210622
- Upgraded to master-20210627 in order to retain accounts and buckets
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0 |more
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-06-28 05:27:42.521 UTC [25] PANIC: could not read file "global/pg_control": Input/output error
stopped waiting
pg_ctl: could not start server
Examine the log output.
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0 |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME READY STATUS RESTARTS AGE
noobaa-core-0 1/1 Running 0 22m
noobaa-db-pg-0 0/1 Error 2 22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b 0/1 Terminating 0 4d23h
noobaa-endpoint-6886745f66-rdd4m 1/1 Running 0 22m
noobaa-operator-57d449689c-zb56f 1/1 Running 0 22m
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0 -p |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME READY STATUS RESTARTS AGE
noobaa-core-0 1/1 Running 0 22m
noobaa-db-pg-0 0/1 CrashLoopBackOff 2 22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b 0/1 Terminating 0 4d23h
noobaa-endpoint-6886745f66-rdd4m 1/1 Running 0 23m
noobaa-operator-57d449689c-zb56f 1/1 Running 0 23m
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned noobaa/noobaa-db-pg-0 to worker2.ocp-akshat-1.cp.fyre.ibm.com
Normal AddedInterface 21m multus Add eth0 [10.254.17.98/22]
Normal Pulling 21m kubelet Pulling image "noobaa/noobaa-core:master-20210627"
Normal Pulled 20m kubelet Successfully pulled image "noobaa/noobaa-core:master-20210627" in 30.126640313s
Normal Created 20m kubelet Created container init
Normal Started 20m kubelet Started container init
Warning Failed 2m34s (x4 over 3m14s) kubelet Error: failed to resolve symlink "/var/lib/kubelet/pods/ce44b338-0155-430c-97d7-5408c230e0b4/volumes/kubernetes.io~csi/pvc-d1c22d45-5f3b-4684-8f4c-48880815f451/mount": lstat /var/mnt/fs1: stale NFS file handle
Normal Pulled 104s (x6 over 20m) kubelet Container image "centos/postgresql-12-centos7" already present on machine
Normal Created 103s (x2 over 20m) kubelet Created container db
Normal Started 103s (x2 over 20m) kubelet Started container db
Warning BackOff 11s (x11 over 2m21s) kubelet Back-off restarting failed container
More information - Screenshots / Logs / Other output
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 26 (9 by maintainers)
@nimrod-becker Here is the list of pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-0f3ae11a-971c-4480-9398-d3f37fb145a8 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h gpfs-vol-pvc-new1 Bound gpfs-pv-3 250Gi RWX 4d20h gpfs-vol-pvc-new11 Bound gpfs-pv-31 250Gi RWX 3d16h noobaa-default-backing-store-noobaa-pvc-1ff51808 Bound pvc-c164a9ac-1855-4788-8219-a2f2ab8ce831 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h [root@api.ns.cp.fyre.ibm.com ~]#
gpfs-vol-pvc-new1 is used for endpoint pod