kubernetes: Custom FlexVolume no longer working after 1.6.3 update

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): Yes and no, the behavior of the flex volume interface has changed between 1.5 and 1.6 and I’m struggling to understand/change my implementation.

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): flex volume, VerifyControllerAttachedVolume, isattached, getvolumename

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT / DOCUMENTATION_ISSUE?

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.3”, GitCommit:“0480917b552be33e2dba47386e51decb1a211df6”, GitTreeState:“clean”, BuildDate:“2017-05-10T15:38:08Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“6”, GitVersion:“v1.6.3”, GitCommit:“0480917b552be33e2dba47386e51decb1a211df6”, GitTreeState:“clean”, BuildDate:“2017-05-10T15:38:08Z”, GoVersion:“go1.7.5”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

  • Cloud provider or hardware configuration: AWS

  • OS (e.g. from /etc/os-release): NAME=“Container Linux by CoreOS” ID=coreos VERSION=1353.7.0 VERSION_ID=1353.7.0 BUILD_ID=2017-04-26-2154 PRETTY_NAME=“Container Linux by CoreOS 1353.7.0 (Ladybug)” ANSI_COLOR=“38;5;75” HOME_URL=“https://coreos.com/” BUG_REPORT_URL=“https://issues.coreos.com

  • Kernel (e.g. uname -a): Linux ip-10-20-14-151.eu-west-1.compute.internal 4.9.24-coreos #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel® Xeon® CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux

  • Install tools: Custom Implementation

What happened:

My flex volume, “bounded-local”, mounted a local sparse file which can also automatically be bind-mounted to another location to be picked up for log collection by Splunk or Fluentd etc… It doesn’t need devices and so in the previous working implementation on 1.5 the attach and dettach operations were left “Not Supported” and only init, mount and umount were used.

Testing the 1.6 upgrade I have found that my plugin as implemented no longer works and emits this error;

E0516 12:13:26.313138 10531 desired_state_of_world_populator.go:272] Failed to add volume “logging” (specName: “logging”) for pod “bfd94f2d-397e-11e7-9952-06d9bc92ad18” to desiredStateOfWorld. err=failed to GetUniqueVolumeNameFromSpec for volumeSpec “logging” using volume plugin “zopa.com/bounded-local” err=failed to GetVolumeName from volumePlugin for volumeSpec “logging” err=invalid character ‘I’ looking for beginning of value

This has led me to implement getvolumename and produce a random name (because a unique name is not possible to be derived from the “json opts”

getvolumename() {
  UUID=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 64 ; echo '')

  log "{\"status\": \"Success\", \"volumeName\":\"${UUID}\"}"
  exit 0
}

I have also tried implementing stub functions for these plugin hooks that I don’t really require: -

attach() {
  log "{\"status\": \"Success\", \"attached\":true, \"device\": \"/dev/bounded-local\"}"
  exit 0
}

isattached() {
  log "{\"status\": \"Success\", \"attached\":true}"
  exit 0
}

waitforattach() {
  log "{\"status\": \"Success\", \"device\": \"/dev/bounded-local\"}"
  exit
}

detach() {
  log "{\"status\": \"Success\"}"
  exit 0
}

The full code is here https://github.com/davidmccormick/bounded-local-controller/blob/1.6_upgrades/bounded-local

Now I am stuck with this error message:

E0516 11:22:04.276796 23422 nestedpendingoperations.go:262] Operation for “"zopa.com/bounded-local/VJEyExr7D3abk6o5xcOBZySqJL9pOd7pYIwnHmXaTLf6k2p0rXvjGSaVMH8FBN44"” failed. No retries permitted until 2017-05-16 11:22:04.776776935 +0000 UTC (durationBeforeRetry 500ms). Error: Volume “zopa.com/bounded-local/VJEyExr7D3abk6o5xcOBZySqJL9pOd7pYIwnHmXaTLf6k2p0rXvjGSaVMH8FBN44” (spec.Name: “logging”) pod “c319829f-397e-11e7-9952-06d9bc92ad18” (UID: “c319829f-397e-11e7-9952-06d9bc92ad18”) has not yet been added to the list of VolumesInUse in the node’s volume status

What you expected to happen:

I’m looking for help to implement a simple flexvolume that doesn’t really need to do any attaching or dettaching - just mounts and unmounts. I’m not sure what I need to do in order to keep the kubelet on the happy path and run my mount and unmount functions.

How to reproduce it (as minimally and precisely as possible):

Add the file /etc/kubernetes/vol-plugins/zopa.com~bounded-local/bounded-local from https://github.com/davidmccormick/bounded-local-controller/blob/1.6_upgrades/bounded-local

run the kubelet with:

[Unit]
Description=Kubernetes Kubelet
Requires=pre-kubelet.service

[Service]
ExecStartPre=-/usr/bin/docker stop kubelet
ExecStartPre=-/usr/bin/docker rm kubelet
ExecStartPre=/usr/bin/docker run --rm \
  -v /opt/bin:/hostbin:rw \
  gcr.io/google_containers/hyperkube:v1.6.3 \
    bash -c "cp -p /hyperkube /hostbin/kubectl"
ExecStart=/usr/bin/docker run --rm \
 --name kubelet \
 --volume=/:/rootfs:ro \
 --volume=/sys:/sys:ro \
 --volume=/dev:/dev \
 --volume=/var/lib/docker/:/var/lib/docker:rw \
 --volume=/var/lib/kubelet/:/var/lib/kubelet:shared \
 --volume=/etc:/etc:rw \
 --volume=/var/run:/var/run:rw \
 --net=host --pid=host --privileged=true \
 gcr.io/google_containers/hyperkube:v1.6.3 \
 /hyperkube kubelet \
  --node-labels="type=worker,ingress=true" \
  --hostname-override=10.20.13.194 \
  --register-node=true \
  --containerized \
  --cgroup-driver=systemd \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --allow-privileged=true \
  --network-plugin=cni \
  --cni-conf-dir=/etc/cni/net.d \
  --cni-bin-dir=/opt/cni/bin \
  --cluster-dns=192.168.0.10 \
  --cluster-domain=cluster.local \
  --v=256 \
  --require-kubeconfig=true \
  --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml \
  --tls-cert-file=/etc/kubernetes/ssl/kubelet.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/kubelet-key.pem \
  --volume-plugin-dir=/etc/kubernetes/vol-plugins \
  --cloud-config=/etc/kubernetes/cloud_config \
  --cloud-provider=aws
ExecStop=/usr/bin/docker stop kubelet
 
TimeoutStartSec=0
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

Here is a podspec for a test pod which creates a random log file within the flex volume: -

apiVersion: v1
kind: Pod
metadata:
  name: logging-test1
  annotations:
    launched_as: part_of_voltest
  labels:
    k8s-app: random_logger
    voltestpod: "true"
spec:
  containers:
  - name: logtest1
    image: davidmccormick/random_log_generator
    volumeMounts:
    - name: logging
      mountPath: /logs
  volumes:
  - name: logging
    flexVolume:
      driver: zopa.com/bounded-local
      fsType: ext4
      options:
        size: "4096"
        cleanupDelay: "60"
        logCollectCopy: "true"
  - name: podinfo
    downwardAPI:
      items:
        - path: namespace
          fieldRef:
            fieldPath: metadata.namespace
        - path: podname
          fieldRef:
            fieldPath: metadata.name
        - path: labels
          fieldRef:
            fieldPath: metadata.labels
        - path: annotations
          fieldRef:
            fieldPath: metadata.annotations

Anything else we need to know:

I appreciate that Flexvolume is in Alpha and therefore likely to change often - I’m happy to update as required, but I’m finding it hard from the documentation as it stands to understand how the new functions work and what needs to happen to be “added to the list of VolumesInUse in the node’s volume status”

Thanks

Dave

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 28 (20 by maintainers)

Most upvoted comments

@shrinandj Both exit statuses will work fine

Hi This is now working for me - I had introduced problem when I changed mount to expect 3 arguments thinking I needed to support attach/detach and then didn’t reduce it back to 2 again when I reverted to mount/unmount only (which was preferable).

So I still see lots of errors on the attach/dettach but it then goes on to mount the volume , e.g.

E0522 09:44:59.577753    7900 nestedpendingoperations.go:262] Operation for "\"zopa.com/bounded-local/logging\"" failed. No retries permitted until 2017-05-22 09:45:00.0777355 +0000 UTC (durationBeforeRetry 500ms). Error: Volume "zopa.com/bounded-local/logging" (spec.Name: "logging") pod "52026872-3ed3-11e7-9bee-0a33a06e86e2" (UID: "52026872-3ed3-11e7-9bee-0a33a06e86e2") has not yet been added to the list of VolumesInUse in the node's volume status
I0522 09:45:00.079189    7900 reconciler.go:231] VerifyControllerAttachedVolume operation started for volume "zopa.com/bounded-local/logging" (spec.Name: "logging") pod "52026872-3ed3-11e7-9bee-0a33a06e86e2" (UID: "52026872-3ed3-11e7-9bee-0a33a06e86e2")
E0522 09:45:00.080012    7900 nestedpendingoperations.go:262] Operation for "\"zopa.com/bounded-local/logging\"" failed. No retries permitted until 2017-05-22 09:45:01.079993732 +0000 UTC (durationBeforeRetry 1s). Error: Volume "zopa.com/bounded-local/logging" (spec.Name: "logging") pod "52026872-3ed3-11e7-9bee-0a33a06e86e2" (UID: "52026872-3ed3-11e7-9bee-0a33a06e86e2") has not yet been added to the list of VolumesInUse in the node's volume status

This is improved if I change the default “Not Supported” response to a log and exit 0 - can I safely do this? Change: -

case "$op" in
  init)
    init $*
    ;;
  mount)
    domount $*
    ;;
  unmount)
    unmount $*
    ;;
  *)
    err "{ \"status\": \"Not supported\" }"
    exit 1
esac

to

case "$op" in
  init)
    init $*
    ;;
  mount)
    domount $*
    ;;
  unmount)
    unmount $*
    ;;
  *)
    log "{ \"status\": \"Not supported\" }"
    exit 0
esac

Thanks for your help!