kubernetes: StatefulSet Pod stuck in pending waiting for volumes to attach/mount (AWS)

BUG REPORT

Kubernetes version (use kubectl version): v1.5.1 (hyperkube v1.5.1_coreos.0)

Environment:

Cloud provider or hardware configuration: AWS (private subnets across 3 AZs)
OS (e.g. from /etc/os-release): CoreOS stable
Kernel (e.g. uname -a): 4.7.3-coreos-r2
Install tools: Cloudinit + Hyperkube
Others:

What happened: After creating a storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

and a StatefulSet:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: consul
  namespace: core
spec:
  serviceName: consul
  replicas: 3
  template:
    metadata:
      labels:
        app: consul
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      terminationGracePeriodSeconds: 0
      containers:
      - name: consul
        image: consul:0.7.2
        ports:
        - containerPort: 8500
        - containerPort: 8600
        volumeMounts:
        - name: consul-data
          mountPath: /var/lib/consul
  volumeClaimTemplates:
  - metadata:
      name: consul-data
      annotations:
        volume.alpha.kubernetes.io/storage-class: ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

All 3 requested volumes (ELBs) are created and the first pod is scheduled, the volume is attached to the instance but then nothing else happens. Eventually the mount times out and the pod is stuck pending:

λ kn describe pod consul-0
Name:		consul-0
Namespace:	core
Node:		ip-172-20-12-10.eu-west-1.compute.internal/172.20.12.10
Start Time:	Tue, 20 Dec 2016 17:12:00 +0000
Labels:		app=consul
Status:		Pending
IP:		
Controllers:	StatefulSet/consul
Containers:
  consul:
    Container ID:	
    Image:		consul:0.7.2
    Image ID:		
    Ports:		8500/TCP, 8600/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /var/lib/consul from consul-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vtb0f (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  consul-data:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	consul-data-consul-0
    ReadOnly:	false
  default-token-vtb0f:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-vtb0f
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----							-------------	--------	------		-------
  2m		2m		1	{default-scheduler }							Normal		Scheduled	Successfully assigned consul-0 to ip-172-20-12-10.eu-west-1.compute.internal
  48s		48s		1	{kubelet ip-172-20-12-10.eu-west-1.compute.internal}			Warning		FailedMount	Unable to mount volumes for pod "consul-0_core(6ba1ec70-c6d7-11e6-a1f4-0622fbf400d9)": timeout expired waiting for volumes to attach/mount for pod "consul-0"/"core". list of unattached/unmounted volumes=[consul-data]
  48s		48s		1	{kubelet ip-172-20-12-10.eu-west-1.compute.internal}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "consul-0"/"core". list of unattached/unmounted volumes=[consul-data]

What you expected to happen: The mount should have completed and the pod started.

How to reproduce it (as minimally and precisely as possible): Using the above manifests

Anything else do we need to know:

➜ kn get pv 
NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                       REASON    AGE
pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           Delete          Bound     core/consul-data-consul-0             1h
pvc-240d7caa-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           Delete          Bound     core/consul-data-consul-1             1h
pvc-240e4cd0-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           Delete          Bound     core/consul-data-consul-2             1h

➜ kn get pvc
NAME                   STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
consul-data-consul-0   Bound     pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           1h
consul-data-consul-1   Bound     pvc-240d7caa-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           1h
consul-data-consul-2   Bound     pvc-240e4cd0-c6cf-11e6-a1f4-0622fbf400d9   1Gi        RWO           1h

➜ kn describe pv pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9
Name:		pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9
Labels:		failure-domain.beta.kubernetes.io/region=eu-west-1
		failure-domain.beta.kubernetes.io/zone=eu-west-1c
StorageClass:	
Status:		Bound
Claim:		core/consul-data-consul-0
Reclaim Policy:	Delete
Access Modes:	RWO
Capacity:	1Gi
Message:	
Source:
    Type:	AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:	aws://eu-west-1c/vol-06a263cc5dbe35d74
    FSType:	ext4
    Partition:	0
    ReadOnly:	false
No events.

➜ aws ec2 describe-volumes --volume-ids vol-06a263cc5dbe35d74
{
    "Volumes": [
        {
            "VolumeId": "vol-06a263cc5dbe35d74",
            "Size": 1,
            "CreateTime": "2016-12-20T16:12:45.206Z",
            "State": "in-use",
            "Iops": 100,
            "Encrypted": false,
            "VolumeType": "gp2",
            "Tags": [
                {
                    "Key": "KubernetesCluster",
                    "Value": "eu-west-1.kube.usw.co"
                },
                {
                    "Key": "Name",
                    "Value": "kubernetes-dynamic-pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9"
                },
                {
                    "Key": "kubernetes.io/created-for/pv/name",
                    "Value": "pvc-240c5146-c6cf-11e6-a1f4-0622fbf400d9"
                },
                {
                    "Key": "kubernetes.io/created-for/pvc/name",
                    "Value": "consul-data-consul-0"
                },
                {
                    "Key": "kubernetes.io/created-for/pvc/namespace",
                    "Value": "core"
                }
            ],
            "Attachments": [
                {
                    "VolumeId": "vol-06a263cc5dbe35d74",
                    "State": "attached",
                    "Device": "/dev/xvdba",
                    "InstanceId": "i-0baea1a1746a216d7",
                    "DeleteOnTermination": false,
                    "AttachTime": "2016-12-20T16:12:48.000Z"
                }
            ],
            "SnapshotId": "",
            "AvailabilityZone": "eu-west-1c"
        }
    ]
}

kubernetes: StatefulSet Pod stuck in pending waiting for volumes to attach/mount (AWS)

About this issue

Most upvoted comments