ceph-csi: RBD: OOMKills occurs when secret metadata encryption type is used with multiple PVC create request.
I tested secret based encryption with 3.7.1 i dont see any crash with below limits
Limits:
cpu: 500m
memory: 256Mi
Requests:
cpu: 250m
memory: 256Mi
[🎩︎]mrajanna@fedora rbd $]kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
claim0 Bound pvc-81cb47af-dfbd-4360-b444-7636e8a2c359 1Gi RWO rook-ceph-block 4m48s
claim1 Bound pvc-c1c20239-db18-4c56-b9da-8745b0046428 1Gi RWO rook-ceph-block 4m47s
claim10 Bound pvc-7d6e66d8-ceea-4e0c-9775-64aa84b1548b 1Gi RWO rook-ceph-block 4m46s
claim2 Bound pvc-3a61744c-2d0e-46c1-9d8c-3b0f5f49574c 1Gi RWO rook-ceph-block 4m47s
claim3 Bound pvc-aa2613fb-4db8-4254-801b-8d9d72e83979 1Gi RWO rook-ceph-block 4m47s
claim4 Bound pvc-00fbe104-1809-4a11-8c39-9e3ceee9d5c9 1Gi RWO rook-ceph-block 4m47s
claim5 Bound pvc-7ad36255-755b-4bdd-a88c-4bbf695e8b69 1Gi RWO rook-ceph-block 4m47s
claim6 Bound pvc-bcd48a37-3d8d-47ce-b780-511020690397 1Gi RWO rook-ceph-block 4m47s
claim7 Bound pvc-1fac5dcf-1668-489a-8799-16630b74e971 1Gi RWO rook-ceph-block 4m47s
claim8 Bound pvc-949006fd-5b47-4e9c-acc1-3808894245f8 1Gi RWO rook-ceph-block 4m46s
claim9 Bound pvc-05ab1ba5-4c8f-41a5-ab09-c042b6089b23 1Gi RWO rook-ceph-block 4m46s
but when i tested with metadata type encryption i can see the crash, this confirms we have a memory leak?
_Originally posted by @Madhu-1 in https://github.com/ceph/ceph-csi/issues/3402#issuecomment-1278691078_
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (15 by maintainers)
On Mon, Jan 09, 2023 at 12:59:35AM -0800, Lennart Jern wrote:
Sounds like we need limit the number of concurrent scrypt.Key calls. A semaphore like https://pkg.go.dev/golang.org/x/sync/semaphore might help with that.
I guess it needs some configurable option, passed on the commandline of the container.
This OOMKill happens in the csi-rbdplugin nodeplugin pod right ? the nodeplugin calls cryptsetup.
This issue was originally to address OOMKills in provisioner pod during volume creation. That seems to be resolved now.
I’d expect the CO in this case kublet or the one responsible for issuing nodepublish/stage calls to have some kind of limit for simultaneous csi calls.
Similar to the one for csi-provsioner https://github.com/kubernetes-csi/external-provisioner/blob/cd81ed5d31835d1aabad74db869c1165df9e3666/cmd/csi-provisioner/csi-provisioner.go#L81