rook: init-copy-binaries can leave container in a bad state

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior: If the program crashes (due to OOM, segfault, etc), the restarting the container should not leave things in a bad state

Expected behavior: If the copy fails, the copy is re-attempted on next init container run

How to reproduce it (minimal and precise):

  1. Set a limit range on the namespace:
apiVersion: v1
kind: LimitRange
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  limits:
  - type: Container
    defaultRequest:
      memory: 8Mi
    default:
      memory: 64Mi
  1. Restart rook-ceph-operator so it triggers the rook-ceph-csi-detect-version job
  2. Watch the job get OOMKilled then restarted

Environment:

Rook Version 1.3.1 Ceph CSI Version 2.0.1

Details: Looking at the code, it looks like you don’t do an atomic copy and skip if the file already exists. So when the program crashes, the incomplete file with the wrong permissions is left there so it manifests as a “Permission Denied” when trying to run the executable without the executable bits set.

Aside from the issue above, why does copying files take over 64MiB of memory!? Can’t you just make this a cp -a instead of implementing a simple function like this in Go?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (17 by maintainers)

Most upvoted comments

Hey @parth-gr I will take it up. please assign it to me. I have already looked at the code earlier once. Will send a PR for this in this week.