kubernetes: Custom FlexVolume no longer working after 1.6.3 update
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): Yes and no, the behavior of the flex volume interface has changed between 1.5 and 1.6 and I’m struggling to understand/change my implementation.
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): flex volume, VerifyControllerAttachedVolume, isattached, getvolumename
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT / DOCUMENTATION_ISSUE?
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.3”, GitCommit:“0480917b552be33e2dba47386e51decb1a211df6”, GitTreeState:“clean”, BuildDate:“2017-05-10T15:38:08Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.3”, GitCommit:“0480917b552be33e2dba47386e51decb1a211df6”, GitTreeState:“clean”, BuildDate:“2017-05-10T15:38:08Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}
Environment:
-
Cloud provider or hardware configuration: AWS
-
OS (e.g. from /etc/os-release): NAME=“Container Linux by CoreOS” ID=coreos VERSION=1353.7.0 VERSION_ID=1353.7.0 BUILD_ID=2017-04-26-2154 PRETTY_NAME=“Container Linux by CoreOS 1353.7.0 (Ladybug)” ANSI_COLOR=“38;5;75” HOME_URL=“https://coreos.com/” BUG_REPORT_URL=“https://issues.coreos.com”
-
Kernel (e.g.
uname -a
): Linux ip-10-20-14-151.eu-west-1.compute.internal 4.9.24-coreos #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel® Xeon® CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux -
Install tools: Custom Implementation
What happened:
My flex volume, “bounded-local”, mounted a local sparse file which can also automatically be bind-mounted to another location to be picked up for log collection by Splunk or Fluentd etc… It doesn’t need devices and so in the previous working implementation on 1.5 the attach and dettach operations were left “Not Supported” and only init, mount and umount were used.
Testing the 1.6 upgrade I have found that my plugin as implemented no longer works and emits this error;
E0516 12:13:26.313138 10531 desired_state_of_world_populator.go:272] Failed to add volume “logging” (specName: “logging”) for pod “bfd94f2d-397e-11e7-9952-06d9bc92ad18” to desiredStateOfWorld. err=failed to GetUniqueVolumeNameFromSpec for volumeSpec “logging” using volume plugin “zopa.com/bounded-local” err=failed to GetVolumeName from volumePlugin for volumeSpec “logging” err=invalid character ‘I’ looking for beginning of value
This has led me to implement getvolumename and produce a random name (because a unique name is not possible to be derived from the “json opts”
getvolumename() {
UUID=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 64 ; echo '')
log "{\"status\": \"Success\", \"volumeName\":\"${UUID}\"}"
exit 0
}
I have also tried implementing stub functions for these plugin hooks that I don’t really require: -
attach() {
log "{\"status\": \"Success\", \"attached\":true, \"device\": \"/dev/bounded-local\"}"
exit 0
}
isattached() {
log "{\"status\": \"Success\", \"attached\":true}"
exit 0
}
waitforattach() {
log "{\"status\": \"Success\", \"device\": \"/dev/bounded-local\"}"
exit
}
detach() {
log "{\"status\": \"Success\"}"
exit 0
}
The full code is here https://github.com/davidmccormick/bounded-local-controller/blob/1.6_upgrades/bounded-local
Now I am stuck with this error message:
E0516 11:22:04.276796 23422 nestedpendingoperations.go:262] Operation for “"zopa.com/bounded-local/VJEyExr7D3abk6o5xcOBZySqJL9pOd7pYIwnHmXaTLf6k2p0rXvjGSaVMH8FBN44"” failed. No retries permitted until 2017-05-16 11:22:04.776776935 +0000 UTC (durationBeforeRetry 500ms). Error: Volume “zopa.com/bounded-local/VJEyExr7D3abk6o5xcOBZySqJL9pOd7pYIwnHmXaTLf6k2p0rXvjGSaVMH8FBN44” (spec.Name: “logging”) pod “c319829f-397e-11e7-9952-06d9bc92ad18” (UID: “c319829f-397e-11e7-9952-06d9bc92ad18”) has not yet been added to the list of VolumesInUse in the node’s volume status
What you expected to happen:
I’m looking for help to implement a simple flexvolume that doesn’t really need to do any attaching or dettaching - just mounts and unmounts. I’m not sure what I need to do in order to keep the kubelet on the happy path and run my mount and unmount functions.
How to reproduce it (as minimally and precisely as possible):
Add the file /etc/kubernetes/vol-plugins/zopa.com~bounded-local/bounded-local from https://github.com/davidmccormick/bounded-local-controller/blob/1.6_upgrades/bounded-local
run the kubelet with:
[Unit]
Description=Kubernetes Kubelet
Requires=pre-kubelet.service
[Service]
ExecStartPre=-/usr/bin/docker stop kubelet
ExecStartPre=-/usr/bin/docker rm kubelet
ExecStartPre=/usr/bin/docker run --rm \
-v /opt/bin:/hostbin:rw \
gcr.io/google_containers/hyperkube:v1.6.3 \
bash -c "cp -p /hyperkube /hostbin/kubectl"
ExecStart=/usr/bin/docker run --rm \
--name kubelet \
--volume=/:/rootfs:ro \
--volume=/sys:/sys:ro \
--volume=/dev:/dev \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet/:/var/lib/kubelet:shared \
--volume=/etc:/etc:rw \
--volume=/var/run:/var/run:rw \
--net=host --pid=host --privileged=true \
gcr.io/google_containers/hyperkube:v1.6.3 \
/hyperkube kubelet \
--node-labels="type=worker,ingress=true" \
--hostname-override=10.20.13.194 \
--register-node=true \
--containerized \
--cgroup-driver=systemd \
--pod-manifest-path=/etc/kubernetes/manifests \
--allow-privileged=true \
--network-plugin=cni \
--cni-conf-dir=/etc/cni/net.d \
--cni-bin-dir=/opt/cni/bin \
--cluster-dns=192.168.0.10 \
--cluster-domain=cluster.local \
--v=256 \
--require-kubeconfig=true \
--kubeconfig=/etc/kubernetes/node-kubeconfig.yaml \
--tls-cert-file=/etc/kubernetes/ssl/kubelet.pem \
--tls-private-key-file=/etc/kubernetes/ssl/kubelet-key.pem \
--volume-plugin-dir=/etc/kubernetes/vol-plugins \
--cloud-config=/etc/kubernetes/cloud_config \
--cloud-provider=aws
ExecStop=/usr/bin/docker stop kubelet
TimeoutStartSec=0
Restart=always
RestartSec=10s
[Install]
WantedBy=multi-user.target
Here is a podspec for a test pod which creates a random log file within the flex volume: -
apiVersion: v1
kind: Pod
metadata:
name: logging-test1
annotations:
launched_as: part_of_voltest
labels:
k8s-app: random_logger
voltestpod: "true"
spec:
containers:
- name: logtest1
image: davidmccormick/random_log_generator
volumeMounts:
- name: logging
mountPath: /logs
volumes:
- name: logging
flexVolume:
driver: zopa.com/bounded-local
fsType: ext4
options:
size: "4096"
cleanupDelay: "60"
logCollectCopy: "true"
- name: podinfo
downwardAPI:
items:
- path: namespace
fieldRef:
fieldPath: metadata.namespace
- path: podname
fieldRef:
fieldPath: metadata.name
- path: labels
fieldRef:
fieldPath: metadata.labels
- path: annotations
fieldRef:
fieldPath: metadata.annotations
Anything else we need to know:
I appreciate that Flexvolume is in Alpha and therefore likely to change often - I’m happy to update as required, but I’m finding it hard from the documentation as it stands to understand how the new functions work and what needs to happen to be “added to the list of VolumesInUse in the node’s volume status”
Thanks
Dave
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 28 (20 by maintainers)
@shrinandj Both exit statuses will work fine
Hi This is now working for me - I had introduced problem when I changed mount to expect 3 arguments thinking I needed to support attach/detach and then didn’t reduce it back to 2 again when I reverted to mount/unmount only (which was preferable).
So I still see lots of errors on the attach/dettach but it then goes on to mount the volume , e.g.
This is improved if I change the default “Not Supported” response to a log and exit 0 - can I safely do this? Change: -
to
Thanks for your help!