kubernetes: CSI block: Why is NodePublishVolume called in SetUpDevice?

What happened: I’m doing a retroactive review of the CSI block implementation and I have a few questions:

Why are there 4 different paths that point to the same block device? I see: * global map path * staging target path * publish target path * pod path

This seems like an unnecessary amount of redirection. Ideally, global map path == staging target path, and publish target path == pod path.

Along the same lines, I see that publish target path is not a per pod path and NodeStageVolume and NodePublishVolume are called in the same SetUpDevice() function. This kind of makes the NodePublish call useless because it’s just bind mounting the volume to another global location, which is what NodeStage already provides. This is also different from filesystem semantics, where NodeStage/Unstage is serialized per volume, and NodePublish/Unpublish is serialized per pod.

What you expected to happen: Can we simplify this and try to align with how the filesystem semantics work? The more intermediate steps we have, the more likely one of them is going to fail. Also each additional mount consumes kernel resources, and will make operations that need to list all mounts slower.

@kubernetes/sig-storage-bugs /assign @mkimuram @vladimirvivien @bswartz

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 24 (21 by maintainers)

Most upvoted comments

@bswartz @mkimuram @wongma7 and I discussed this in more detail today. Summary:

  • Let’s move the publish call back to MapDevice so that it is called per pod and not per volume. This is to be consistent with how NodePublish/Stage calls work for filesystem volumes. This may also require adding UnmapDevice to the BlockVolume interface
  • We may need to clarify in the CSI spec the semantics around multi access on a single node.
  • We are unable to simplify the number of paths for now. globalMapPath and podPath are needed for Kubernetes to refcnt the number of pods using the device to gate the unstage call. The path returned by nodeStageVolume is not actually a device, it’s a directory where a plugin keeps information about the device.
  • It’s a little odd that the Kubernetes refcnting mechanism is exposed into the plugin. Maybe that can be cleaned up later.