kubernetes: Upgrading to v1.5.2 creates duplicate replica sets
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): multiple replicaset, duplicate replicaset
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): Ubuntu 14.04.3 LTS
- Kernel (e.g.
uname -a): Linux ip-10-0-10-162 3.13.0-87-generic #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux - Install tools: Custom terraform+ansible setup; using hyperkube binary
- Others: This cluster has disabled auth and serviceaccounts
What happened: Upgrading control plane to v1.5.2 (from v1.4.7) created duplicate replicasets (and hence duplicate pods) for some deployments
What you expected to happen: Existing deployments/replicasets/pods to be the same OR being replaced - not duplicates
How to reproduce it (as minimally and precisely as possible): Keep a deployment with volume mounts (probably EBS) running on v1.4.7. Upgrade control plane components to v1.5.2
Anything else do we need to know: Interestingly, this seems to have happened only for deployments that use shared volume mounts (either EBS or otherwise).
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 11
- Comments: 65 (52 by maintainers)
Commits related to this issue
- Merge pull request #40854 from kargakis/upgrade-test-for-deployments Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045) Upgrade test for deployments Upgrade... — committed to bruceauyeung/kubernetes by deleted user 7 years ago
- Merge pull request #41717 from kargakis/add-upgrade-test-logging Automatic merge from submit-queue Spew replica sets in any deployment upgrade test failure Should help identifying whether the new r... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #41851 from janetkuo/deployment-duplicate-rs Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373) Fix deployment helper - no assumptions on ... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #42335 from kargakis/cherry-pick-15 Automatic merge from submit-queue fix rsListerSynced and podListerSynced for DeploymentController Cherry-pick of https://github.com/kubernetes... — committed to kubernetes/kubernetes by deleted user 7 years ago
We have the same issue. After 1.5.2 upgrade on GKE (from 1.4.1, 1.4.2, 1.4.7 depending on env) no volumes though), it started happening. However, it persists sporadically on certain deployments, even long after upgrade is complete, and we push new deployments.
To note, we use the ‘rolling restart’ hack of injecting a ‘date’ key into the metadata.labels, which forces the RR. This is due to our images having ‘test’ and ‘prod’ tags that we promote to. (kubectl patch deployment my-deployment -p “{"spec":{"template":{"metadata":{"labels":{"date":"datehere"}}}}}”)
What’s odd is that after scaling down the old RS, k8s immediately scales it back up to what it was stuck at (1 or 2 replicas usually, for a 4-6 replica deployment). Only deleting the old RS entirely gets it into the desirable state
It’s easily reproducible by using the rolling restart hack, and running it again a few seconds later as the first one is still restarting. Our deployment pipeline only does the restart hack once, and this still happens