spinnaker: ECS deploys broken in release-1.19.x-latest-validated

Issue Summary:

After upgrading to release-1.19.x-latest-unvalidated I was unable to deploy an ECS service

Cloud Provider(s):

ecs

Environment:

local debian

Feature Area:

clouddriver, ecs

Description:

After upgrading to release-1.19.x-latest-unvalidated, ECS deploy (test-v009) was taking forever.

Clouddriver logs show the following log:

Mar 06 22:52:06 ip-172-28-10-142 clouddriver[25314]: 2020-03-06 22:52:06.410  INFO 25314 --- [0.1-7002-exec-7] c.n.s.c.e.p.v.EcsServerClusterProvider   : No ECS Server Groups were found with the name test-v009
Mar 06 22:52:06 ip-172-28-10-142 gate[25305]: 2020-03-06 22:52:06.420  INFO 25305 --- [-serverGroups-9] c.n.s.g.s.internal.ClouddriverService    : <--- HTTP 404 http://localhost:7002/applications/test/serverGroups/stage-ecs/us-east-2/test-v009?includeDetails=false (191ms)

looks as if the newly created servergroup never gets into redis

redis> keys *test-v009*
(empty list or set)
(1.26s)

Steps to Reproduce:

Upgrade to release-1.19.x-latest-unvalidated and start an ECS deploy

Additional Details:

Some weirdness in Deck as well: items in the cluster view disappear periodically

Screenshot 2020-03-06 at 23 41 30 Screenshot 2020-03-06 at 23 39 46

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15

Most upvoted comments

Glad to hear you’re seeing some improvement @awsiv ! Though I’d like to get to the bottom of things hanging when using load balancer health.

BTW, it this available in 1.19.0 ?

https://github.com/spinnaker/clouddriver/pull/4417 will be available in 1.19.1, which I understand will be out very soon.

re: closing this - in https://github.com/spinnaker/spinnaker/issues/5528 it sounds like deployments are hanging with either load balancer or cloud provider health checks, so I’m ok with keeping this open until we understand what’s happening in your case.

@allisaurus I upgraded to 1.19.1 today and deploy seems to be working fine 🎉 Deployments are no londer stuck in Wait For Up Instances

so it looks like a one-off issue. Thanks for you help in solving this issue! 💯 👍