noobaa-core: noobaa-db-pg pod when migrated, doesn't allow the new users or new buckets to be created
Environment info
- NooBaa Version: VERSION
- Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify
Noobaa version is RC code of the ODF 4.9.0
noobaa status INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: quay.io/rhceph-dev/mcg-core@sha256:6ce2ddee7aff6a0e768fce523a77c998e1e48e25d227f93843d195d65ebb81b9 INFO[0000] operator-image: quay.io/rhceph-dev/mcg-operator@sha256:cc293c7fe0fdfe3812f9d1af30b6f9c59e97d00c4727c4463a5b9d3429f4278e INFO[0000] noobaa-db-image: registry.redhat.io/rhel8/postgresql-12@sha256:b3e5b7bc6acd6422f928242d026171bcbed40ab644a2524c84e8ccb4b1ac48ff INFO[0000] Namespace: openshift-storage
oc version Client Version: 4.9.5 Server Version: 4.9.5 Kubernetes Version: v1.22.0-rc.0+a44d0f0
Actual behavior Note: This defect is created taking from the comments in the https://github.com/noobaa/noobaa-core/issues/6853
Node down scenario where the noobaa-db is running on a worker node and when it is shutdown the noobaa-db pod has to be migrated, should allow the new IO users and new IO can be spawned. It doesn’t seem to be the case.
Expected behavior
Steps to reproduce
`Basic Q here: When a node that is currently running noobaa-db-pg-0 is made down, then the noobaa-db-pg-0 has been moved to other worker node and got into Running state around 6min delay, however after that we can’t create
– new users – new buckets (using s3mb)
Step 1:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NOD E READINESS GATES noobaa-core-0 1/1 Running 0 20h 10.254.23.179 worker2.rkomandu-ta.cp.fyre.ibm.com <none> <none> noobaa-db-pg-0 1/1 Running 0 31m 10.254.12.12 worker1.rkomandu-ta.cp.fyre.ibm.com <none> <none>
Step 2: Made worker1 down where noobaa-db-pg-0 is running
worker0.rkomandu-ta.cp.fyre.ibm.com Ready worker 53d v1.22.0-rc.0+a44d0f0 worker1.rkomandu-ta.cp.fyre.ibm.com NotReady worker 53d v1.22.0-rc.0+a44d0f0 worker2.rkomandu-ta.cp.fyre.ibm.com Ready worker 53d v1.22.0-rc.0+a44d0f0
Step 3: noobaa-db-pg-0 moved to worker2 from worker1 noobaa-db-pg-0 0/1 Init:0/2 0 15s <none> worker2.rkomandu-ta.cp.fyre.ibm.com <none> <none>
Step 4: Still it is trying to get Initialized
noobaa-db-pg-0 0/1 Init:0/2 0 3m56s <none> worker2.rkomandu-ta.cp.fyre.ibm.com <none>
<none>
Step 5: After 6mXsec, it got into Running state
noobaa-db-pg-0 1/1 Running 0 91m 10.254.23.217 worker2.rkomandu-ta.cp.fyre.ibm.com <none>
<none>
Step 6: Noobaa api to list_accounts just hangs
noobaa api account_api list_accounts {}
INFO[0000] ✅ Exists: NooBaa “noobaa” INFO[0000] ✅ Exists: Service “noobaa-mgmt” INFO[0000] ✅ Exists: Secret “noobaa-operator” INFO[0000] ✅ Exists: Secret “noobaa-admin” INFO[0000] ✈️ RPC: account.list_accounts() Request: map[] WARN[0000] RPC: GetConnection creating connection to wss://localhost:42325/rpc/ 0xc000a996d0 INFO[0000] RPC: Connecting websocket (0xc000a996d0) &{RPC:0xc0004bd130 Address:wss://localhost:42325/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s} INFO[0000] RPC: Connected websocket (0xc000a996d0) &{RPC:0xc0004bd130 Address:wss://localhost:42325/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
`
This is a bigger problem when we do any Failover testing the new user’s can’t be created , no new buckets can be created as well
Attaching must-gather logs
must-gather.local-noobaa-db-pg-0.tar.gz
Actual behavior
Expected behavior
Steps to reproduce
More information - Screenshots / Logs / Other output
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 39 (14 by maintainers)
Archive.zip
@rkomandu, could you try the following procedure once noobaa-db comes into a working state, but the noobaa RPC API calls do not respond.
Please see the attached archive, it includes a simple RPC test, calling account.list_accounts() using the internal cluster address of noobaa-core.
The archive includes
Sample run