oci-cloud-controller-manager: Failed to get ProviderID by nodeName - StorageClass "oci-bv"

BUG REPORT

Versions

CCM Version:

v 0.9

Environment:

  • Kubernetes version (use kubectl version):
lient Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.12", GitCommit:"7cd5e9086de8ae25d6a1514d0c87bac67ca4a481", GitTreeState:"clean", BuildDate:"2020-11-12T09:11:15Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
  • Kernel (e.g. uname -a):
Linux master01 5.4.0-1028-oracle #29~18.04.1-Ubuntu SMP Tue Oct 6 13:05:53 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

We are trying to implement the new CSI plugin as per the following guide, on a self managed K8S cluster v1.18.5 https://github.com/oracle/oci-cloud-controller-manager/blob/master/container-storage-interface.md

When we create a PVC and deploy a Pod, we can see a new Block Volume created, but not attached to the respective node.

we are getting the following error, when inspecting the pod:

Events:
  Type     Reason              Age                   From                     Message
  ----     ------              ----                  ----                     -------
  Normal   Scheduled           5m42s                 default-scheduler        Successfully assigned default/app1 to worker03
  Warning  FailedAttachVolume  92s (x10 over 5m42s)  attachdetach-controller  AttachVolume.Attach failed for volume "csi-b1b9fabc-faca-4dab-8c96-f97c7c321d43" : rpc error: code = InvalidArgument desc = failed to get ProviderID by nodeName. error : missing provider id for node worker03
  Warning  FailedMount         82s (x2 over 3m39s)   kubelet, 4worker03    Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[persistent-storage default-token-drrj8]: timed out waiting for the condition

Looking at the Nodes labels:

kubectl get nodes --show-labels

NAME           STATUS   ROLES    AGE     VERSION   LABELS
master01   Ready    master   5h25m   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=master01,kubernetes.io/os=linux,node-role.kubernetes.io/master=
worker01   Ready    <none>   5h24m   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker01,kubernetes.io/os=linux
worker03   Ready    <none>   5h24m   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker03,kubernetes.io/os=linux

What you expected to happen?

We are expecting the “oci-bv” StorageClass to provision an OCI Block Volume and attache it to the right node, and then attache it to the right Pod.

How to reproduce it (as minimally and precisely as possible)?

1- A clean self managed K8S cluster v1.18.5

2- OCI Console -> Identity -> Dynamic Groups

All {instance.compartment.id = 'ocid1.compartment.oc1..XXXXXXXXXXXX'}	

3- OCI Console -> Identity -> Policies

Allow dynamic-group oci_csi_group to read vnic-attachments in compartment Test_Compartment
Allow dynamic-group oci_csi_group to read vnics in compartment Test_Compartment
Allow dynamic-group oci_csi_group to read instances in compartment Test_Compartment
Allow dynamic-group oci_csi_group to read subnets in compartment Test_Compartment
Allow dynamic-group oci_csi_group to use volumes in compartment Test_Compartment
Allow dynamic-group oci_csi_group to use instances in compartment Test_Compartment
Allow dynamic-group oci_csi_group to manage volume-attachments in compartment Test_Compartment
Allow dynamic-group oci_csi_group to manage volumes in compartment Test_Compartment
Allow dynamic-group oci_csi_group to manage file-systems in compartment Test_Compartment

4- Create generic secret oci-volume-provisioner

auth:
  region: me-jeddah-1
  tenancy: ocid1.tenancy.oc1..XXXXXX
  useInstancePrincipals: true
compartment: ocid1.compartment.oc1..XXXXXX (Test_Compartment)

5- Apply the manifest

~$  kubectl apply -f https://raw.githubusercontent.com/oracle/oci-cloud-controller-manager/master/manifests/container-storage-interface/oci-csi-node-rbac.yaml

~$  kubectl apply -f https://raw.githubusercontent.com/oracle/oci-cloud-controller-manager/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml

~$ kubectl apply -f https://raw.githubusercontent.com/oracle/oci-cloud-controller-manager/master/manifests/container-storage-interface/oci-csi-node-driver.yaml

~$  kubectl apply -f https://raw.githubusercontent.com/oracle/oci-cloud-controller-manager/master/manifests/container-storage-interface/storage-class.yaml

6- Verify

~$ kubectl -n kube-system get pod  | grep oci
csi-oci-controller-56ddc7fc8d-gl2qp        3/3     Running   0          51m
csi-oci-node-4v9pf                         2/2     Running   0          51m
csi-oci-node-8wdtp                         2/2     Running   0          51m
csi-oci-node-kfcgg                         2/2     Running   0          51m

7- Manually update failure-domain.beta.kubernetes.io/zone label

kubectl label nodes master01 failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1	 --overwrite
kubectl label nodes worker01 failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1	 --overwrite
kubectl label nodes worker03 failure-domain.beta.kubernetes.io/zone=ME-JEDDAH-1-AD-1	 --overwrite

8- Deploy POD and PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: oci-bv-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: oci-bv
  resources:
    requests:
      storage: 50Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: app1
spec:
  containers:
    - name: app1
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: oci-bv-claim

9- kubectl describe pvc/oci-bv-claim

  Normal  Provisioning           10m (x2 over 10m)  blockvolume.csi.oraclecloud.com_master01_2a582dbf-7cec-467a-81af-c6d02efb6144  External provisioner is provisioning volume for claim "default/oci-bv-claim"
  Normal  ProvisioningSucceeded  10m (x2 over 10m)  blockvolume.csi.oraclecloud.com_master01_2a582dbf-7cec-467a-81af-c6d02efb6144  Successfully provisioned volume csi-b1b9fabc-faca-4dab-8c96-f97c7c321d43

10- kubectl describe pod app1

  Warning  FailedAttachVolume  24s (x13 over 10m)    attachdetach-controller  AttachVolume.Attach failed for volume "csi-b1b9fabc-faca-4dab-8c96-f97c7c321d43" : rpc error: code = InvalidArgument desc = failed to get ProviderID by nodeName. error : missing provider id for node worker03

Anything else we need to know?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 61 (28 by maintainers)

Most upvoted comments

Hi @Bytelegion ,

Can you please share the following details in a new issue:

  • K8s cluster and node version
  • Output of the following pod running on the node where mount is failing:
    • kubectl describe pod <csi-oci-node-*> -n kube-system
    • Logs of sidecar container csi-node-registrar in the pod csi-oci-node-*
  • kubelet logs from the node where mount is failing

Thanks.

As per README here https://github.com/oracle/oci-cloud-controller-manager I think you are missing this step Preparing Your Cluster To deploy the Cloud Controller Manager (CCM) your cluster must be configured to use an external cloud-provider.

This involves:

Setting the --cloud-provider=external flag on the kubelet on all nodes in your cluster. Setting the --provider-id=<instanceID> flag on the kubelet on all nodes in your cluster. Where <instanceID> is the instance ocid of a node (unique for each node). Setting the --cloud-provider=external flag on the kube-controller-manager in your Kubernetes control plane.

I would say do not set --provider-id and see what happens, but the other 2 steps are important