kubernetes: Rolling-updates timing out since v1.2.0
Rolling-update tasks which worked since v1.0 are now getting stuck mid-way since v1.2
I’m rolling a multi-container pod from sidekiq-rollout
to sidekiq
. The target number of replicas is 4. The first 2 pods are rolled correctly, but the job hangs and times out (5min) after the first 2 iterations:
NAME DESIRED CURRENT AGE
sidekiq 2 2 56s
sidekiq-rollout 2 3 57s
NAME READY STATUS RESTARTS AGE APP COMPONENT DEPLOYMENT
sidekiq-3mni0 3/3 Running 0 25m fl-backend-rails sidekiq 82 <- new
sidekiq-8glbk 3/3 Running 0 37m fl-backend-rails sidekiq 81
sidekiq-fe1r3 3/3 Running 0 37m fl-backend-rails sidekiq 81
sidekiq-r7c06 3/3 Running 0 25m fl-backend-rails sidekiq 82 <- new
Controller manager logs:
Mar 21 12:31:42 node01 kube-controller-manager[8697]: I0321 12:31:42.687964 8697 replication_controller.go:510] Replication Controller has been deleted bodyweight-api/sidekiq
Mar 21 12:31:46 node01 kube-controller-manager[8697]: I0321 12:31:46.191549 8697 replication_controller.go:434] Too few "bodyweight-api"/"sidekiq" replicas, need 1, creating 1
Mar 21 12:31:46 node01 kube-controller-manager[8697]: I0321 12:31:46.215334 8697 event.go:211] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"bodyweight-api", Name:"sidekiq", UID:"de5b3478-ef60-11e5-b33a-0a59d1e77755", APIVersion:"v1", ResourceVersion:"114094222", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: sidekiq-r7c06
Mar 21 12:32:01 node01 kube-controller-manager[8697]: I0321 12:32:01.725368 8697 replication_controller.go:451] Too many "bodyweight-api"/"sidekiq-rollout" replicas, need 3, deleting 1
Mar 21 12:32:01 node01 kube-controller-manager[8697]: I0321 12:32:01.772642 8697 event.go:211] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"bodyweight-api", Name:"sidekiq-rollout", UID:"ddfa58e1-ef60-11e5-b33a-0a59d1e77755", APIVersion:"v1", ResourceVersion:"114094394", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: sidekiq-2wjbr
Mar 21 12:32:07 node01 kube-controller-manager[8697]: I0321 12:32:07.757771 8697 replication_controller.go:434] Too few "bodyweight-api"/"sidekiq" replicas, need 2, creating 1
Mar 21 12:32:07 node01 kube-controller-manager[8697]: I0321 12:32:07.809994 8697 event.go:211] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"bodyweight-api", Name:"sidekiq", UID:"de5b3478-ef60-11e5-b33a-0a59d1e77755", APIVersion:"v1", ResourceVersion:"114094461", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: sidekiq-3mni0
Mar 21 12:37:16 node01 kube-controller-manager[8697]: I0321 12:37:16.324724 8697 replication_controller.go:451] Too many "bodyweight-api"/"sidekiq-rollout" replicas, need 2, deleting 1
Mar 21 12:37:16 node01 kube-controller-manager[8697]: E0321 12:37:16.327923 8697 controller_utils.go:300] Clobbering existing delete keys: map[bodyweight-api/sidekiq-2wjbr:{}]
Mar 21 12:37:16 node01 kube-controller-manager[8697]: I0321 12:37:16.350471 8697 event.go:211] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"bodyweight-api", Name:"sidekiq-rollout", UID:"ddfa58e1-ef60-11e5-b33a-0a59d1e77755", APIVersion:"v1", ResourceVersion:"114094650", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: sidekiq-bfob2
Note the time it took to scale down sidekiq-rollout
on the second iteration, although all the pods marked new transitioned to a Running
state in less than 10sec.
edit
My RC selector matches 3 labels. After the timeout the 2 RCs look like:
- apiVersion: v1
kind: ReplicationController
metadata:
name: sidekiq-rollout
spec:
replicas: 2
selector:
app: fl-backend-rails
component: sidekiq
deployment: "81"
- apiVersion: v1
kind: ReplicationController
metadata:
name: sidekiq
spec:
replicas: 2
selector:
app: fl-backend-rails
component: sidekiq
deployment: "82"
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 21 (16 by maintainers)
Commits related to this issue
- Merge pull request #61285 from soltysh/issue23276 Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.c... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Merge pull request #23276 from rphillips/backport/77947 UPSTREAM: 77947: Fix panic logspam when running kubelet in standalone mode Origin-commit: 4e58cad90a38a2c8b88721da3c5d19e050f5111b — committed to openshift/kubernetes by k8s-publishing-bot 5 years ago
@kubernetes/sig-cli-bugs
We should consider deprecating the rolling-update command.