noobaa-core: Backend SC caused DB pod restarts. It never came to running state

Environment info

  • NooBaa Version: master-20210627
  • Platform: OCP 4.6.16

Actual behavior

  1. Upgrade to master-20210627 caused db pod crash

Expected behavior

1.DB pod shouldn’t crash

Steps to reproduce

  1. Old code - master-20210622
  2. Upgraded to master-20210627 in order to retain accounts and buckets
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |more
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-06-28 05:27:42.521 UTC [25] PANIC:  could not read file "global/pg_control": Input/output error
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS        RESTARTS   AGE
noobaa-core-0                                      1/1     Running       0          22m
noobaa-db-pg-0                                     0/1     Error         2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating   0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running       0          22m
noobaa-operator-57d449689c-zb56f                   1/1     Running       0          22m
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0   -p  |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS             RESTARTS   AGE
noobaa-core-0                                      1/1     Running            0          22m
noobaa-db-pg-0                                     0/1     CrashLoopBackOff   2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating        0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running            0          23m
noobaa-operator-57d449689c-zb56f                   1/1     Running            0          23m

QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       21m                    default-scheduler  Successfully assigned noobaa/noobaa-db-pg-0 to worker2.ocp-akshat-1.cp.fyre.ibm.com
  Normal   AddedInterface  21m                    multus             Add eth0 [10.254.17.98/22]
  Normal   Pulling         21m                    kubelet            Pulling image "noobaa/noobaa-core:master-20210627"
  Normal   Pulled          20m                    kubelet            Successfully pulled image "noobaa/noobaa-core:master-20210627" in 30.126640313s
  Normal   Created         20m                    kubelet            Created container init
  Normal   Started         20m                    kubelet            Started container init
  Warning  Failed          2m34s (x4 over 3m14s)  kubelet            Error: failed to resolve symlink "/var/lib/kubelet/pods/ce44b338-0155-430c-97d7-5408c230e0b4/volumes/kubernetes.io~csi/pvc-d1c22d45-5f3b-4684-8f4c-48880815f451/mount": lstat /var/mnt/fs1: stale NFS file handle
  Normal   Pulled          104s (x6 over 20m)     kubelet            Container image "centos/postgresql-12-centos7" already present on machine
  Normal   Created         103s (x2 over 20m)     kubelet            Created container db
  Normal   Started         103s (x2 over 20m)     kubelet            Started container db
  Warning  BackOff         11s (x11 over 2m21s)   kubelet            Back-off restarting failed container

More information - Screenshots / Logs / Other output

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (9 by maintainers)

Most upvoted comments

@nimrod-becker Here is the list of pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-0f3ae11a-971c-4480-9398-d3f37fb145a8 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h gpfs-vol-pvc-new1 Bound gpfs-pv-3 250Gi RWX 4d20h gpfs-vol-pvc-new11 Bound gpfs-pv-31 250Gi RWX 3d16h noobaa-default-backing-store-noobaa-pvc-1ff51808 Bound pvc-c164a9ac-1855-4788-8219-a2f2ab8ce831 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h [root@api.ns.cp.fyre.ibm.com ~]#

gpfs-vol-pvc-new1 is used for endpoint pod