spinnaker: "Wait For Not Up Instances" hangs for Kubernetes V1 provider

Issue Summary:

Using Kubernetes V1 provider with GKE, when a servergroup is attached to a load balancer, the disabling hangs.

Cloud Provider(s):

GKE alpha cluster v1.10.2-gke.0

Environment:

Spinnaker v1.7.4 deployed on GKE through Halyard GCE VM

Feature Area:

Servergroup disabling with load balancer attached

Description:

Logs show the servergroup is disabled correctly:

2018-05-19 16:38:49.857  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [ORCHESTRATION] - Processing op: DisableKubernetesAtomicOperation
2018-05-19 16:38:49.874  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Initializing disable operation...
2018-05-19 16:38:49.890  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Looking up provided namespace...
2018-05-19 16:38:49.908  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Finding requisite server group...
2018-05-19 16:38:49.955  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Getting list of attached services...
2018-05-19 16:38:49.976  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Resetting server group service template labels and selectors...
2018-05-19 16:39:00.465  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Resetting service labels for each pod...
2018-05-19 16:39:00.518  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Operating on 100% of pods
2018-05-19 16:39:05.204  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [DISABLE] - Finished disabling server group.
2018-05-19 16:39:05.216  INFO 1 --- [pool-4-thread-2] c.n.s.c.data.task.jedis.JedisTask        : [ORCHESTRATION] - Orchestration completed.

However, the pods still have the load balancer labels and are therefore not disabled. Consequently, the Wait For Not Up Instances step hangs and never finishes.

Steps to Reproduce:

  1. Create new application
  2. Create & deploy new servergroup without any settings
  3. Destroy servergroup
  4. See that this works flawlessly
  5. Create load balancer
  6. Create & deploy new servergroup with that load balancer selected
  7. Destroy servergroup
  8. See that the server group is displayed as DISABLED, but the Wait For Not Up Instances step never finishes

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 28 (13 by maintainers)

Most upvoted comments

@andreasevers can you confirm that this fails on version master-latest-unvalidated as well?

@spinnakerbot add-label provider/kubernetes-v1