spinnaker: The Rosco service does not handle hal redeployments with updated/new configs gracefully during active baking

Issue Summary:

The Rosco service does not handle hal redeployments with updated/new configs gracefully during active baking. Image baking can run in the realm of 20 minutes. During which time, the Rosco pod is terminated within a minute without waiting on the Baking job to complete.

Cloud Provider(s):

GCP Google Container Engine

Environment:

Feature Area (if this issue is UI/UX related, please tag @spinnaker/ui-ux-team):

Halyard deployments, Rosco Bake stage

Description:

Steps to Reproduce:

When running hal deploy apply to modify/add a packer template or setup a google service account, rosco is redeployed with the new configs. If during that time there are running bakes, the image baking pipeline stage hangs indefinitely. The workaround is to trigger a force rebake. Because of our use cases, this is a relatively bad problem escalated to a P0 internally.

Additional Details:

The following error occurs, " java.lang.IllegalStateException: Pool not open" on the Rosco service that is terminated and replaced during hal redeployment.

2019-03-13 20:43:08.062 ERROR 1 — [RxIoScheduler-2] c.n.spinnaker.rosco.executor.BakePoller : Update Polling Error:

redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool at redis.clients.util.Pool.getResource(Pool.java:53) ~[jedis-2.9.0.jar:na] at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226) ~[jedis-2.9.0.jar:na] at redis.clients.jedis.JedisPool$getResource.call(Unknown Source) ~[na:na] at com.netflix.spinnaker.rosco.persistence.RedisBackedBakeStore.getThisInstanceIncompleteBakeIds(RedisBackedBakeStore.groovy:518) ~[rosco-core.jar:na] at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_191] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98) [groovy-all-2.4.15.jar:2.4.15] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-all-2.4.15.jar:2.4.15] at org.codehaus.groovy.runtime.metaclass.MethodMetaProperty$GetBeanMethodMetaProperty.getProperty(MethodMetaProperty.java:76) ~[groovy-all-2.4.15.jar:2.4.15] at org.codehaus.groovy.runtime.callsite.GetEffectivePogoPropertySite.callGetProperty(GetEffectivePogoPropertySite.java:48) ~[groovy-all-2.4.15.jar:2.4.15] at com.netflix.spinnaker.rosco.executor.BakePoller$_onApplicationEvent_closure1.doCall(BakePoller.groovy:81) [rosco-core.jar:na] at com.netflix.spinnaker.rosco.executor.BakePoller$_onApplicationEvent_closure1.doCall(BakePoller.groovy) [rosco-core.jar:na] at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_191] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98) [groovy-all-2.4.15.jar:2.4.15] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-all-2.4.15.jar:2.4.15] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:264) [groovy-all-2.4.15.jar:2.4.15] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034) [groovy-all-2.4.15.jar:2.4.15] at groovy.lang.Closure.call(Closure.java:418) [groovy-all-2.4.15.jar:2.4.15] at org.codehaus.groovy.runtime.ConvertedClosure.invokeCustom(ConvertedClosure.java:54) [groovy-all-2.4.15.jar:2.4.15] at org.codehaus.groovy.runtime.ConversionHandler.invoke(ConversionHandler.java:124) [groovy-all-2.4.15.jar:2.4.15] at com.sun.proxy.$Proxy113.call(Unknown Source) [na:na] at rx.internal.schedulers.SchedulePeriodicHelper$1.call(SchedulePeriodicHelper.java:72) [rxjava-1.3.8.jar:1.3.8] at rx.internal.schedulers.CachedThreadScheduler$EventLoopWorker$1.call(CachedThreadScheduler.java:230) [rxjava-1.3.8.jar:1.3.8] at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) [rxjava-1.3.8.jar:1.3.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] Caused by: java.lang.IllegalStateException: Pool not open at org.apache.commons.pool2.impl.BaseGenericObjectPool.assertOpen(BaseGenericObjectPool.java:672) ~[commons-pool2-2.4.2.jar:2.4.2] at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:412) ~[commons-pool2-2.4.2.jar:2.4.2] at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) ~[commons-pool2-2.4.2.jar:2.4.2] at redis.clients.util.Pool.getResource(Pool.java:49) ~[jedis-2.9.0.jar:na] … 33 common frames omitted

Halyard Release 1.14.0-20190117020510

Spinnaker Release 1.11.4 & 1.12.4


About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 23 (3 by maintainers)

Most upvoted comments

Seems like baking-during-deploys will never get stablized. RIP