seaweedfs-csi-driver: Commit from 7/Jul/2022 breaks nomad

The commit from the 7/Jul/2022 “Pods using the same volume share mount” appears to have broken the CSI driver on nomad, if I build a version prior to that commit everything works as expected, however from that commit onwards the SeaweedFS mount always fails in the target container.

Error from the job mounting the volume:

Driver Failure | failed to create container: API error (400): invalid mount config for type "bind": bind source path does not exist: /opt/nomad/client/csi/monolith/seaweedfs/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer

From the CSI Job:

I0714 02:04:44     1 main.go:38] connect to filer 192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888
I0714 02:04:44     1 driver.go:50] Driver: seaweedfs-csi-driver version: 1.0.0
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44     1 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: CREATE_DELETE_VOLUME
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0714 02:04:44     1 driver.go:110] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44     1 server.go:92] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0714 02:04:53     1 nodeserver.go:32] node stage volume code_server to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 mounter_seaweedfs.go:38] mounting [192.168.8.50:8888 192.168.8.51:8888 192.168.8.52:8888] /testing to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 mounter.go:39] Mounting fuse with command: weed and args: [-logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer -collection=testing -filer=192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888 -filer.path=/testing -cacheCapacityMB=256 -localSocket=/tmp/seaweedfs-mount-1677588823.sock -collectionQuotaMB=953 -replication=001 -concurrentWriters=32 -cacheDir=/alloc/cache_dir]
I0714 02:04:53     1 nodeserver.go:78] volume code_server successfully staged to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 nodeserver.go:87] node publish volume code_server to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53     1 nodeserver.go:118] volume code_server successfully published to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 nodeserver.go:125] node unpublish volume code_server from /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 nodeserver.go:192] node unstage volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58     1 volume.go:117] unmounting volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
W0714 02:04:58     1 mounter.go:66] Unable to find PID of fuse mount /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer, it must have finished already

The CSI driver is mounting the SeaweedFS volume to the staging folder and accessing it on the host will let me view the files from the cluster.

In the old driver the file system is mounted at:

per-alloc/e01bf906-f4e1-64e4-5360-d049dc05355c/code_server/rw-file-system-multi-node-multi-writer

However on the new it’s is mounted at:

/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer

and the alloc just has a symbolic link to the mount:

per-alloc/a464a996-bb12-c2c6-4dec-4993ce31651b/code_server/rw-file-system-multi-node-multi-writer -> /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer

It appears that the target container either can’t follow the sym link or can’t get to /local/csi/staging/.

Maybe it should use a bind mount rather than a symlink?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 16 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@danlsgiga you can build off the docker file in cmd / seaweedfs-csi-driver / Dockerfile, which is what I did to test. Also what I’ll be doing going forwards and pushing to a private registry, that way I can version and revert very quickly if I need to.

@garenchan sorry, please ignore my previous comment about it not working, I’ve just rebuilt the SeaweedFS cluster and it’s working like a dream now. I must have mangled the cluster when I was testing something else.