kubernetes: Pod scheduling not respecting PVC zone (NoVolumeZoneConflict)

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug /sig scheduling

(Not sure about the correct SIG)

What happened:

After migrating (or creating a new cluster) to 1.8, pods aren’t getting assigned only to nodes in the same zone as its PVCs.

What you expected to happen:

Pods with a PVC should only be scheduled at nodes within the same zone.

How to reproduce it (as minimally and precisely as possible):

  1. Start a new cluster in AWS (not sure if affects other cloud providers)
  2. Add nodes in multiple availability zones (e.g.: us-west-2a, us-west-2b, us-west-2c)
  3. Create a stateful set with dynamic PVC provisioning through AWS EBS. Also, add a topology anti-affinity to ensure that the pods will get spread between nodes

After this, the first pod will start correctly, but the second will fail to start as it will probably be scheduled in the wrong availability zone. I’m saying probably because this behavior seems to have some randomness in it. I tried deleting the failing pod a couple times and sometimes I got lucky and the pod was scheduled in the correct zone.

Environment:

  • Kubernetes version (use kubectl version): 1.8
  • Cloud provider or hardware configuration**: AWS
  • OS (e.g. from /etc/os-release): Arch Linux
  • Kernel (e.g. uname -a): Linux ip-10-0-50-163 4.12.10-1-ec2 #1 SMP Fri Sep 1 22:37:26 PDT 2017 x86_64 GNU/Linux
  • Install tools: Custom Cluster from Scratch
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 35 (27 by maintainers)

Commits related to this issue

Most upvoted comments

Just to follow up, removing the AllAlpha=true feature gate from our scheduler config (which disabled EnableEquivalenceClassCache) did solve it! Thanks for everyone’s help.

+1

The first pod in the statefulset works correctly. The second pod always gets scheduled in the incorrect zone for the volume.

AttachVolume.Attach failed for volume "pvc-d55479d3-c3d5-11e7-b334-0a1686dec4b4" : Error attaching EBS volume "vol-0cc689cf71b5c4148" to instance "i-0dee75d43c0299fd6": "InvalidVolume.ZoneMismatch: The volume 'vol-0cc689cf71b5c4148' is not in the same availability zone as instance 'i-0dee75d43c0299fd6'\n\tstatus code: 400, request id: 866aa429-a936-4468-b502-1486e075a969```

I’m seeing this issue using KOPS with kops v1.8 am I SOL?

@msau42 @jakexks Yes, so I will send out a cherry pick up the fix to in 1.8