ceph-csi: CephFS mount syntax not updated for Quincy
Describe the bug
Apparently there was a significant change in the mount.ceph
syntax between Ceph Pacific and Quincy. However Ceph-CSI code does not seem to be updated to support the new syntax.
I use Nomad 1.3.1 and I am trying to use Ceph-CSI to provide CephFS-based volumes to Nomad jobs. I tried the 3.6.2 version of Ceph-CSI (which is already based on Quincy) to mount a CephFS volume from a cluster running Ceph 17.2.0.
I use Nomad instead of Kubernetes, but I don’t think this fact affects this bug.
Environment details
- Image/version of Ceph CSI driver : 3.6.2
- Helm chart version : N/A
- Kernel version : 5.15.41-0-lts
- Mounter used for mounting PVC (for cephFS its
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : kernel - Kubernetes cluster version : N/A
- Ceph cluster version : 17.2.0
Steps to reproduce
Steps to reproduce the behavior:
- Setup Nomad 1.3.x (can run in dev mode) and Ceph 17.2
- In Ceph, create a CephFS called
nomadfs
and admin user - Deploy CSI Controller Plugin job using:
nomad job run ceph-csi-plugin-controller.nomad
- Deploy CSI Node Plugin job using:
nomad job run ceph-csi-plugin-nodes.nomad
- Deploy
sample-fs-volume.hcl
by running:nomad volume register sample-fs-volume.hcl
- Deploy
mysql-fs.nomad
which tries to use the volume created in previous step using:nomad job run mysql-fs.nomad
. - Observe error in
ceph-mysql-fs
job allocation logs.
ceph-csi-plugin-controller.nomad:
job "ceph-fs-csi-plugin-controller" {
datacenters = ["dc1"]
group "controller" {
network {
port "metrics" {}
}
task "ceph-controller" {
driver = "docker"
template {
data = jsonencode([{
clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
monitors = ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
}])
destination = "local/config.json"
change_mode = "restart"
}
config {
image = "quay.io/cephcsi/cephcsi:v3.6.2"
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
mounts = [
{
type = "tmpfs"
target = "/tmp/csi/keys"
readonly = false
tmpfs_options = {
size = 1000000 # size in bytes
}
}
]
args = [
"--type=cephfs",
"--controllerserver=true",
"--drivername=cephfs.csi.ceph.com",
"--endpoint=unix://csi/csi.sock",
"--nodeid=${node.unique.name}",
"--instanceid=${node.unique.name}-controller",
"--pidlimit=-1",
"--logtostderr=true",
"--v=5",
"-stderrthreshold=0",
"--metricsport=$${NOMAD_PORT_metrics}"
]
}
resources {
cpu = 500
memory = 256
}
service {
name = "ceph-fs-csi-controller"
port = "metrics"
tags = [ "prometheus" ]
}
csi_plugin {
id = "ceph-fs-csi"
type = "controller"
mount_dir = "/csi"
}
}
}
}
ceph-csi-plugin-nodes.nomad:
job "ceph-fs-csi-plugin-nodes" {
datacenters = ["dc1"]
type = "system"
group "nodes" {
network {
port "metrics" {}
}
task "ceph-node" {
driver = "docker"
template {
data = jsonencode([{
clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
monitors = ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
}])
destination = "local/config.json"
change_mode = "restart"
}
config {
mount {
type = "tmpfs"
target = "/tmp/csi/keys"
readonly = false
tmpfs_options = {
size = 1000000 # size in bytes
}
}
mount {
type = "bind"
source = "/lib/modules/${attr.kernel.version}"
target = "/lib/modules/${attr.kernel.version}"
readonly = true
}
image = "quay.io/cephcsi/cephcsi:v3.6.2"
privileged = true
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
args = [
"--type=cephfs",
"--drivername=cephfs.csi.ceph.com",
"--nodeserver=true",
"--endpoint=unix://csi/csi.sock",
"--nodeid=${node.unique.name}",
"--instanceid=${node.unique.name}-nodes",
"--pidlimit=-1",
"--logtostderr=true",
"--v=5",
"--metricsport=$${NOMAD_PORT_metrics}"
]
}
resources {
cpu = 500
memory = 256
}
service {
name = "ceph-fs-csi-nodes"
port = "metrics"
tags = [ "prometheus" ]
}
csi_plugin {
id = "ceph-fs-csi"
type = "node"
mount_dir = "/csi"
}
}
}
}
sample-fs-volume.hcl:
id = "ceph-mysql-fs"
name = "ceph-mysql-fs"
type = "csi"
plugin_id = "ceph-fs-csi"
external_id = "nomadfs"
capability {
access_mode = "multi-node-multi-writer"
attachment_mode = "file-system"
}
secrets {
adminID = "admin"
adminKey = "AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA=="
userID = "admin"
userKey = "AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA=="
}
parameters {
clusterID = "67b72852-d1b8-45ad-b1f8-edb8c150ff9b"
fsName = "nomadfs"
}
context {
monitors = "192.168.1.10,192.168.1.11,192.168.1.12"
provisionVolume = "false"
rootPath = "/"
}
mysql-fs.nomad:
variable "mysql_root_password" {
description = "Password for MySQL root user"
type = string
default = "password"
}
job "mysql-server-fs" {
datacenters = ["dc1"]
type = "service"
group "mysql-server-fs" {
count = 1
volume "ceph-mysql-fs" {
type = "csi"
attachment_mode = "file-system"
access_mode = "multi-node-multi-writer"
read_only = false
source = "ceph-mysql-fs"
}
network {
port "db" {
static = 3306
}
}
restart {
attempts = 10
interval = "5m"
delay = "25s"
mode = "delay"
}
task "mysql-server" {
driver = "docker"
volume_mount {
volume = "ceph-mysql-fs"
destination = "/srv"
read_only = false
}
env {
MYSQL_ROOT_PASSWORD = "${var.mysql_root_password}"
}
config {
image = "hashicorp/mysql-portworx-demo:latest"
args = ["--datadir", "/srv/mysql"]
ports = ["db"]
}
resources {
cpu = 500
memory = 1024
}
service {
provider = "nomad"
name = "mysql-server"
port = "db"
}
}
}
}
Actual results
Ceph-CSI node plugin failed to mount CephFS.
Expected behavior
Ceph-CSI node plugin should successfully mount CephFS using the new mount.ceph
syntax.
Logs
nomad alloc status
events:
Recent Events:
Time Type Description
2022-08-16T12:32:18+02:00 Setup Failure failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ /local/csi/staging/ceph-mysql-fs/rw-file-system-multi-node-multi-writer -o name=admin,secretfile=/tmp/csi/keys/keyfile-2337295656,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-16T10:31:57.974+0000 7fdda8b9df40 -1 failed for service _ceph-mon._tcp
mount error: no mds server is up or the cluster is laggy
I suspect the unable to get monitor info from DNS SRV
error happens because the mount.ceph
helper in 17.x does not recognize anymore passing monitor IPs this way and falls back to using DNS SRV records.
Additional context
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 34
This seems to be an active issue, where the only workaround is downgrading cephcsi. There is also an open PR for it. Should it be reopened?
As a data point, it affects me too:
Changing the CSI ConfigMap fixes it: from:
to:
Someone has to make the first report 😉 maybe this use-case is not very popular?
Yes.
Yes, I get the same error:
Yes, same error:
When I try to mount using the Quincy
mount.ceph
syntax it works:EDIT:
I just tested with
quay.io/cephcsi/cephcsi:v3.5.1
(which is based on Pacific) and the mount commands which failed previously do work there.