external-provisioner: Pod is not created under selected zone of Volume

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: I am testing dynamic provisioning with EBS CSI driver with delayed binding. Most of the time the pod is created under the same zone as volume. There is one time that pod creation is failed because of pod is created in a different zone to volume’s zone.

What you expected to happen: Volume and pod should always be created under the same topology domain with volume scheduling enabled.

How to reproduce it (as minimally and precisely as possible): Non-deterministic so far

Anything else we need to know?: Provisioner log:

I1015 20:30:34.630580       1 controller.go:991] provision "default/late-claim" class "late-sc": started
I1015 20:30:34.643414       1 controller.go:121] GRPC call: /csi.v0.Identity/GetPluginCapabilities
I1015 20:30:34.643430       1 controller.go:122] GRPC request:
I1015 20:30:34.643634       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"late-claim", UID:"1d44c823-d0b9-11e8-81f1-0a75e9a76798", APIVersion:"v1", ResourceVersion:"1694", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/late-claim"
I1015 20:30:34.644379       1 controller.go:124] GRPC response: capabilities:<service:<type:CONTROLLER_SERVICE > > capabilities:<service:<type:ACCESSIBILITY_CONSTRAINTS > >
I1015 20:30:34.644443       1 controller.go:125] GRPC error: <nil>
I1015 20:30:34.644453       1 controller.go:121] GRPC call: /csi.v0.Controller/ControllerGetCapabilities
I1015 20:30:34.644459       1 controller.go:122] GRPC request:
I1015 20:30:34.645083       1 controller.go:124] GRPC response: capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > >
I1015 20:30:34.645139       1 controller.go:125] GRPC error: <nil>
I1015 20:30:34.645151       1 controller.go:121] GRPC call: /csi.v0.Identity/GetPluginInfo
I1015 20:30:34.645190       1 controller.go:122] GRPC request:
I1015 20:30:34.645621       1 controller.go:124] GRPC response: name:"com.amazon.aws.csi.ebs" vendor_version:"0.0.1"
I1015 20:30:34.645658       1 controller.go:125] GRPC error: <nil>
I1015 20:30:34.661737       1 controller.go:428] CreateVolumeRequest {Name:pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798 CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<> access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] ControllerCreateSecrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1015 20:30:34.661862       1 controller.go:121] GRPC call: /csi.v0.Controller/CreateVolume
I1015 20:30:34.661868       1 controller.go:122] GRPC request: name:"pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" capacity_range:<required_bytes:4294967296 > volume_capabilities:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > accessibility_requirements:<requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > > >
I1015 20:30:34.841760       1 leaderelection.go:227] successfully renewed lease default/com.amazon.aws.csi.ebs
I1015 20:30:35.114422       1 controller.go:124] GRPC response: volume:<capacity_bytes:4294967296 id:"vol-0c696d140008a61a8" accessible_topology:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > >
I1015 20:30:35.114527       1 controller.go:125] GRPC error: <nil>
I1015 20:30:35.114540       1 controller.go:484] create volume rep: {CapacityBytes:4294967296 Id:vol-0c696d140008a61a8 Attributes:map[] ContentSource:<nil> AccessibleTopology:[segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1015 20:30:35.114631       1 controller.go:546] successfully created PV {GCEPersistentDisk:nil AWSElasticBlockStore:nil HostPath:nil Glusterfs:nil NFS:nil RBD:nil ISCSI:nil Cinder:nil CephFS:nil FC:nil Flocker:nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil PortworxVolume:nil ScaleIO:nil Local:nil StorageOS:nil CSI:&CSIPersistentVolumeSource{Driver:com.amazon.aws.csi.ebs,VolumeHandle:vol-0c696d140008a61a8,ReadOnly:false,FSType:ext4,VolumeAttributes:map[string]string{storage.kubernetes.io/csiProvisionerIdentity: 1539635296092-8081-com.amazon.aws.csi.ebs,},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,}}
I1015 20:30:35.114740       1 controller.go:1091] provision "default/late-claim" class "late-sc": volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" provisioned
I1015 20:30:35.114760       1 controller.go:1105] provision "default/late-claim" class "late-sc": trying to save persistentvvolume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798"
I1015 20:30:35.134870       1 controller.go:1112] provision "default/late-claim" class "late-sc": persistentvolume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" saved
I1015 20:30:35.134930       1 controller.go:1153] provision "default/late-claim" class "late-sc": succeeded
I1015 20:30:35.135246       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"late-claim", UID:"1d44c823-d0b9-11e8-81f1-0a75e9a76798", APIVersion:"v1", ResourceVersion:"1694", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798

EBS Driver Log:

I1015 20:28:16.294223       1 driver.go:52] Driver: com.amazon.aws.csi.ebs
I1015 20:28:16.294360       1 mount_linux.go:199] Detected OS without systemd
I1015 20:28:16.294928       1 driver.go:107] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
I1015 20:30:34.644708       1 controller.go:175] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I1015 20:30:34.662445       1 controller.go:31] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798", CapacityRange:(*csi.CapacityRange)(0xc0001ab560), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc0001b0b80)}, Parameters:map[string]string(nil), ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(0xc0001d78b0), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}

POD event:

Name:               app
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               ip-172-20-127-156.ec2.internal/172.20.127.156
Start Time:         Mon, 15 Oct 2018 13:30:35 -0700
Labels:             <none>
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container app
Status:             Pending
IP:
Containers:
  app:
    Container ID:
    Image:         centos
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      while true; do echo $(date -u) >> /data/out.txt; sleep 5; done
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /data from persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rw8jc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  late-claim
    ReadOnly:   false
  default-token-rw8jc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-rw8jc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           18s   default-scheduler        Successfully assigned default/app to ip-172-20-127-156.ec2.internal
  Warning  FailedAttachVolume  17s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Co
uld not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatc
h: The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: 33c4a9cc-37d1-4e78-b37b-c9df81f659f9
  Warning  FailedAttachVolume  17s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Cou
ld not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatch
: The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: 8baabab8-e0e1-4063-9107-ea86cb7c9fda
  Warning  FailedAttachVolume  16s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Cou
ld not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatch
: The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: fff0faa0-df0e-4ad8-af27-0483267b09f7
  Warning  FailedAttachVolume  14s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Cou
ld not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatch
: The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: 2b03cea9-1ccb-4f65-91f8-bca33dab29f1
  Warning  FailedAttachVolume  10s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Cou
ld not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatch
: The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: 8b1129ab-1289-493a-a02b-981aa9d9478f
  Warning  FailedAttachVolume  2s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-1d44c823-d0b9-11e8-81f1-0a75e9a76798" : rpc error: code = Internal desc = Coul
d not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": could not attach volume "vol-0c696d140008a61a8" to node "i-0bf114cd21779ff49": InvalidVolume.ZoneMismatch:
 The volume 'vol-0c696d140008a61a8' is not in the same availability zone as instance 'i-0bf114cd21779ff49'
           status code: 400, request id: 3a1f317d-8240-4f16-99ff-12982b1d673c
>> cat late-bind-sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: late-sc
provisioner: com.amazon.aws.csi.ebs
volumeBindingMode: WaitForFirstConsumer
>> cat late-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: late-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: late-sc
  resources:
    requests:
      storage: 4Gi
>> cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: late-claim

Environment:

  • Kubernetes version (use kubectl version): client: v1.12.0 server: v1.12.1
  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: cluster is set up using kops
  • Others:
    • external-registrar: v0.4.0
    • external-provisioner: v0.4.0
    • external-attacher: v0.4.0

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (7 by maintainers)

Most upvoted comments

After upgrading to v0.4.1 for provisioner/attacher/registrar, I can see preferred too.