kubernetes: PVCs using `standard` StorageClass create PDs in disks in wrong zone in multi-zone GKE clusters
/kind bug
What happened:
- Created single-zone cluster in
us-central1-c
- Expanded to multiple zones after creation using
gcloud container clusters update ... --additional-zones
tob
anda
- Deployed PVC with
storageClassName: standard
, which is a StorageClass included with GKE. - PV is created in
us-central1-f
, a zone with no nodes from this cluster in it
What you expected to happen:
- PV created in one of the zones that the cluster has nodes in
How to reproduce it (as minimally and precisely as possible):
See What Happened
. I’m not sure the expansion after creation step is necessary.
Anything else we need to know?:
May be a duplicate of https://github.com/kubernetes/kubernetes/issues/39178, but that’s on AWS
Environment:
- Kubernetes version (use
kubectl version
):
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration**: GKE 1.6.4
- OS (e.g. from /etc/os-release): cos
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 53 (47 by maintainers)
Commits related to this issue
- Fixes #50115 Changed GetAllZones to only get zones with nodes that are currently running (renamed to GetAllCurrentZones). Added E2E test to confirm this behavior. — committed to davidz627/kubernetes by davidz627 7 years ago
- Merge pull request #52322 from davidz627/multizoneWrongZone Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https:... — committed to kubernetes/kubernetes by deleted user 7 years ago
@dims I’m just a clueless user. Don’t ask me to make decisions 😛
Were there some instances running in the project in us-central1-f ? This comment explains the current heuristic and the limitations of it: https://github.com/kubernetes/kubernetes/blob/3c080e83c7974fa21c5a47264f992a3d0189e143/pkg/cloudprovider/providers/gce/gce_instances.go#L264-L273
Aside: I would love to fix this by having the cloudprovider get access to the kubernetes client, watch the Nodes, and then it could simply iterate over the nodes and pull the zone out of the failure-domain labels. If the cloudprovider had access to kubernetes API objects (Nodes, really) it would avoid a lot of work in the AWS provider, and I’ve heard from other cloudproviders that it would also benefit them.