democratic-csi: Unable to mount iscsi devices in nomad deployment
Trying to turn up an *arr application with a /config directory provided by an iscsi mount. I was able to create the volume via csc, and then set it up in nomad via terraform. However the last step, mounting it into the container, does not seem to work.
I have tried both as a file-system and a block-device attachment mode.
There must be something simple and obvious that I’m doing wrong, since I can’t find anything on the internet about this error message, but I’m not sure where to start looking.
Thanks in advance, again!
allocation logs:
2021-08-29T16:14:46-05:00 Setup Failure failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code= Internal desc = {"code":125,"stdout":"","stderr":"chroot: cannot change root directory to '/host': No such file or directory\n"}
when I go look at the stderr for the node job I have this output:
{
code: 13,
message: `{"code":125,"stdout":"","stderr":"chroot: cannot change root directory to '/host': No such file or directory\\n"}`
}
stdout contains, along with a lot of status messages,
{"service":"democratic-csi","level":"info","message":"new request - driver: FreeNASDriver method: NodeStageVolume call: {\"_events\":{},\"_eventsCount\":1,\"call\":{},\"cancelled\":false,\"metadata\":{\"_internal_repr\":{\"user-agent\":[\"grpc-go/1.29.1\"]},\"flags\":0},\"request\":{\"publish_context\":{},\"secrets\":\"redacted\",\"volume_context\":{\"iqn\":\"iqn.2005-10.org.freenas.ctl:csi-radarr-volume-discovery\",\"lun\":\"0\",\"node_attach_driver\":\"iscsi\",\"portal\":\"10.10.220.110:3260\",\"provisioner_driver\":\"freenas-iscsi\"},\"volume_id\":\"radarr-volume\",\"staging_target_path\":\"/csi-data/staging/radarr-volume/rw-file-system-single-node-writer\",\"volume_capability\":{\"access_mode\":{\"mode\":\"SINGLE_NODE_WRITER\"},\"mount\":{\"mount_flags\":[\"noatime\"],\"fs_type\":\"ext4\"},\"access_type\":\"mount\"}}}"}
{"service":"democratic-csi","level":"debug","message":"operation lock keys: [\"volume_id_radarr-volume\"]"}
{"service":"democratic-csi","level":"verbose","message":"validating capabilities: [{\"access_mode\":{\"mode\":\"SINGLE_NODE_WRITER\"},\"mount\":{\"mount_flags\":[\"noatime\",\"defaults\"],\"fs_type\":\"ext4\"},\"access_type\":\"mount\"}]"}
executing fileystem command: stat /csi-data/staging/radarr-volume/rw-file-system-single-node-writer
executing iscsi command: iscsiadm -m node -T iqn.2005-10.org.freenas.ctl:csi-radarr-volume-discovery -p 10.10.220.110:3260 -o new
{"service":"democratic-csi","level":"error","message":"handler error - driver: FreeNASDriver method: NodeStageVolume error: {\"code\":125,\"stdout\":\"\",\"stderr\":\"chroot: cannot change root directory to '/host': No such file or directory\\n\"}"}
{"service":"democratic-csi","level":"info","message":"new request - driver: FreeNASDriver method: NodeUnpublishVolume call: {\"_events\":{},\"_eventsCount\":1,\"call\":{},\"cancelled\":false,\"metadata\":{\"_internal_repr\":{\"user-agent\":[\"grpc-go/1.29.1\"]},\"flags\":0},\"request\":{\"volume_id\":\"radarr-volume\",\"target_path\":\"/csi-data/per-alloc/f08e66af-efc9-b7a6-e3d6-750de3ad87fb/radarr-volume/rw-file-system-single-node-writer\"}}"}
{"service":"democratic-csi","level":"debug","message":"operation lock keys: [\"volume_id_radarr-volume\"]"}
executing mount command: findmnt --mountpoint /csi-data/per-alloc/f08e66af-efc9-b7a6-e3d6-750de3ad87fb/radarr-volume/rw-file-system-single-node-writer --output source,target,fstype,label,options,avail,size,used -b -J
executing fileystem command: stat /csi-data/per-alloc/f08e66af-efc9-b7a6-e3d6-750de3ad87fb/radarr-volume/rw-file-system-single-node-writer
failed to execute filesystem command: stat /csi-data/per-alloc/f08e66af-efc9-b7a6-e3d6-750de3ad87fb/radarr-volume/rw-file-system-single-node-writer, response: {"code":1,"stdout":"","stderr":"stat: cannot stat '/csi-data/per-alloc/f08e66af-efc9-b7a6-e3d6-750de3ad87fb/radarr-volume/rw-file-system-single-node-writer': No such file or directory\n"}
{"service":"democratic-csi","level":"info","message":"new response - driver: FreeNASDriver method: NodeUnpublishVolume response: {}"}
{"service":"democratic-csi","level":"info","message":"new request - driver: FreeNASDriver method: NodeUnstageVolume call: {\"_events\":{},\"_eventsCount\":1,\"call\":{},\"cancelled\":false,\"metadata\":{\"_internal_repr\":{\"user-agent\":[\"grpc-go/1.29.1\"]},\"flags\":0},\"request\":{\"volume_id\":\"radarr-volume\",\"staging_target_path\":\"/csi-data/staging/radarr-volume/rw-file-system-single-node-writer\"}}"}
{"service":"democratic-csi","level":"debug","message":"operation lock keys: [\"volume_id_radarr-volume\"]"}
executing mount command: findmnt --mountpoint /csi-data/staging/radarr-volume/rw-file-system-single-node-writer/block_device --output source,target,fstype,label,options,avail,size,used -b -J
executing mount command: findmnt --mountpoint /csi-data/staging/radarr-volume/rw-file-system-single-node-writer --output source,target,fstype,label,options,avail,size,used -b -J
executing mount command: findmnt --mountpoint /csi-data/staging/radarr-volume/rw-file-system-single-node-writer --output source,target,fstype,label,options,avail,size,used -b -J
executing fileystem command: stat /csi-data/staging/radarr-volume/rw-file-system-single-node-writer
executing fileystem command: rmdir /csi-data/staging/radarr-volume/rw-file-system-single-node-writer
{"service":"democratic-csi","level":"info","message":"new response - driver: FreeNASDriver method: NodeUnstageVolume response: {}"}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 70 (45 by maintainers)
Basic docs and examples started here: https://github.com/democratic-csi/democratic-csi/tree/master/docs/Nomad
No, zero ports are opened there.
Node Job Config
I’m going to attempt installing a single-node
Nomadsetup and get myself a little more familiar with the ecosystem. Maybe with a bit more knowledge I can be of more help.I may have some updates to the ubuntu setup of the iscsi things. I need to verify it, and I’ve not had time to set up a new host node from scratch yet.
That’s fair. I’ve never had any issues and I’m aware of some very large organizations using the driver who have never reported any issues either…not that either of those is much consolation 😦
Also note the idea is common in the arena: https://www.docker.com/blog/road-to-containing-iscsi/
You could hypothetically copy the binary from the host to an isolated chroot but you’d need to copy all the linked libraries etc with it which could prove challenging. Mounting / is much easier unfortunately.
https://github.com/democratic-csi/democratic-csi/blob/master/docs/Nomad/examples/democratic-csi-iscsi-node.hcl#L35
The reason the
/mount is required is because the driver actually uses theiscsiadmbinary from the host (that binary must match versions with the daemon etc making it effectively impossible to just install and use directly in the driver container). This is all pretty transparent in the node driver container because I have a little shell script (https://github.com/democratic-csi/democratic-csi/blob/master/docker/iscsiadm) callediscsiadmwhich is simply a wrapper which invoke the real binary usingchrootwith…you guessed it/hostas the root 😃The basic premise with csi is that the nodes themselves handle mounting before the target container is ever created (which is why you don’t need iscsi tools in every container etc). So effectively what happens is Nomad starts the high-level process to launch a new workload and sees a volume is at play, Nomad sends a grpc request to the appropriate csi driver on the appropriate node telling it to mount the volume (stage and/or publish in csi parlance). The csi driver is instructed to mount the volume at a path designated by Nomad, after that succeeds, when the workload container actually launches Nomad automatically does a bind mount to the target location using the path previously mentioned as the designated path on the host as the source dir.
If those are the exact resources you used then they are very low and I’m not surprised if you had erratic behavior. I’m not sure what health check features are built-in to Nomad to automatically restart containers in failure scenarios but those would be good to add to the files I’m putting together for sure.
https://discuss.hashicorp.com/t/setting-up-democratic-csi-volumes-using-nomad/29077
Just to crosspost, in case someone comes across this and is wondering the same things.
I do have an application running with a volume created via csc, registered via terraform and mounted as a file-system. It’s stayed up and running for quite a while. I haven’t had an opportunity to start/stop it so it ends up on a new node, I only have one amd64 node at the moment, and it needs amd64 arch.
I think it’s probably good, I need to redo some of the iSCSI setup on the host itself, I think ubuntu is doing some things that are incompatible with how this plugin expects iscsi to work. Like, not starting it if the /etc/iscsi/nodes folder is empty. It might be empty in this case, and things are dynamically registered, so it’s dumb. Simple overrides with systemd should solve it. I’ll write up a blog post about it. I’m also about to post in the nomad community forums about the volume configuration and see if I can’t figure out why it’s not working.
Thanks for all your help!
Gotcha, I appreciate that. I wasn’t sure if it had something to do with iscsi vs nfs.
That appears to be something to take up with the Nomad team. By default I don’t bind the server to any tcp port but only a unix domain socket. Maybe try to delete the container and let it restart to see if it becomes healthy?
Yeah I’m near certain that it’s because you commented or removed the
targetPortalskey in the config…it should be an empty array value.In the upcoming code I handle this case gracefully but it’s not in master just yet (probably will be by end of week as I have about 4 months of work ready to land).
I should also note that I’m relatively confident Nomad has expanded csi support even further eliminating the need to manually use
cscetc for volume creation and all that: https://github.com/hashicorp/nomad/issues/8212HOLY CRAP IT WORKED.
I needed to delete that existing iqn file. It never should’ve been there. I probably set it up while trying to figure out if my iscsi stuff was working by running
iscsiadm -m discovery -t st -p 10.10.220.110That created an entry in the/etc/iscsi/nodesfolder that was causing problems I think.Deleting that, a new one was created, and the task is placed! Now to see if it’s actually working… Yeah, it’s seeing the 5GB volume that I created, and it’s working just fine now. I’ll see if I can give it a bit of soak time, and make sure things work from scratch again.
I ended up using this issue as a sorta running log of my fails, so it’s a stream of conciousness here. The bottom parts are the most recent.
Well, I’ve made sure I’m running as
--net hosthere and--ipc hosthereUnfortunately, I still get the same error message.
Oh wait, I had to
mkdir /etc/iscsi/ifacesand then I got a bit further.I just need to figure out why it’s a read-only filesystem, even though I explicitly told it not to be. The mount for
/hostwas read-only, it’s now read-write.next fail:
I don’t have a file ending in
,3260I have one ending in,3260,2… I think there’s something there, but I’m not sure what it should be. This leads me to believe that something’s wrong with my volume configuration… It’s missing a valueinside the
/etc/iscsi/nodes/iqn.2005-10.org.freenas.ctl:csi-radarr-volume-discovery/10.10.220.110,3260,2/defaultthere’s anode.tpgt=2value, maybe that’s somehow missing in my volume config? It’s the Target Portal Group Tag, and it’s 2 in my case.Somehow the file is different for what it’s looking for, vs what it found.
That error could be related to the
podcontainer not running in the host networking namespace (ie:--net host, however you achieve that with Nomad).Awesome! Have you ran through the node prep steps in the doc relating to iscsi? That should be a pretty good start. Note that technically multipath is not required but you are welcome to use it if it makes sense for your scenario.
That is correct. All the mounts/settings mentioned should be for the node application deployed to the cluster and not the actual workloads. Sorry that was confusing. Essentially the node application needs to run highly privileged and with a bunch of the host tools available to it so it can do the business of attaching to iscsi targets, formatting disks, etc, etc.
It shouldn’t be at csi hook time (if I understand you correctly) but rather when you deploy the node portion of the driver. All of the previous comment had to do with how the containers are deployed should be running before any workload attempts to be started.
My experience with Nomad is essentially 0 so forgive my ignorance. However the problem at hand here is that you need to mount the host
/into/hostinside the node container (this would be great to document how exactly this is achieved with Nomad). I would recommend browsing through this to get ideas of what’s needed for iscsi as well: https://github.com/democratic-csi/charts/blob/master/stable/democratic-csi/templates/node.yamlFrom a cursory look it appears you’ll need:
iscsiadmfor example must be available directly on the host, you should haveiscsidstarted, etc)/mounted to/hostin the container/etc/iscsimounted to/etc/iscsiin the container (bidirectional)/var/lib/iscsimounted to/var/lib/iscsiin the container (bidirectional)--net hostor equivalent)--ipc hostor equivalent…this is for multipath specifically)I think that should get you going on the right track.