spinnaker: Clouddriver: Unready - java.lang.OutOfMemoryError
Issue Summary:
After upgrading from 1.8.4 to 1.9.2 we find that after some time Clouddriver memory usage abruptly starts to climb to ~10GB before Clouddriver falls over and reports Unready in GKE. Clouddriver remains Unready until the pod is terminated and the process is repeated.
Cloud Provider(s):
GKE
Environment:
Node version 1.10.4-gke.2 Node image Container-Optimized OS (cos) Machine type n1-standard-8 (8 vCPUs, 30 GB memory) Total cores 16 vCPUs Total memory 60.00 GB . Spinnaker has been deployed to a cluster configured using Kubernetes v1 provider, while managing several other GKE clusters using Kubernetes v2 provider.
Feature Area:
Clouddriver
Additional Details:
We never knowingly experienced memory issues under previous Spinnaker releases. free -mh and historical metrics suggest that we have some spare capacity (~15%) at the point of failure, although we are provisioning larger nodes now.
Resource usage graph:
CPU fall-off coincides with Clouddriver unreadiness.

Log exerpts:
I
I org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:982) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:635) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) [tomcat-embed-websocket-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.boot.web.filter.ApplicationContextHeaderFilter.doFilterInternal(ApplicationContextHeaderFilter.java:55) [spring-boot-1.5.10.RELEASE.jar:1.5.10.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.filter.ShallowEtagHeaderFilter.doFilterInternal(ShallowEtagHeaderFilter.java:110) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.boot.actuate.trace.WebRequestTraceFilter.doFilterInternal(WebRequestTraceFilter.java:110) [spring-boot-actuator-1.5.10.RELEASE.jar:1.5.10.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:170) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at com.netflix.spinnaker.fiat.shared.FiatAuthenticationFilter.doFilter(FiatAuthenticationFilter.java:46) [fiat-api-0.49.6.jar:0.49.6]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177) [spring-security-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:108) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.springframework.boot.actuate.autoconfigure.MetricsFilter.doFilterInternal(MetricsFilter.java:106) [spring-boot-actuator-1.5.10.RELEASE.jar:1.5.10.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-4.3.14.RELEASE.jar:4.3.14.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at javax.servlet.FilterChain$doFilter.call(Unknown Source) [tomcat-embed-core-8.5.27.jar:8.5.27]
at com.netflix.spinnaker.filters.AuthenticatedRequestFilter.doFilter(AuthenticatedRequestFilter.groovy:135) [kork-web-2.0.0.jar:2.0.0]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459) [tomcat-embed-core-8.5.27.jar:8.5.27]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-8.5.27.jar:8.5.27]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-8.5.27.jar:8.5.27]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:18:49.882 ERROR 1 --- [.0-7002-exec-12] c.n.s.k.w.e.GenericExceptionHandlers : Internal Server Error
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: Java heap space
I 2018-09-04 18:18:17.055 INFO 1 --- [utionAction-597] c.n.s.c.cache.LoggingInstrumentation : kubernetes:gke_dl-platform_us-central1-b_platform-labv2/KubernetesNamespaceCachingAgent[1/1] completed in 29.51s
I 2018-09-04 18:18:14.748 INFO 1 --- [utionAction-605] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesConfigMapCachingAgent[1/1] completed in 132.956s
I 2018-09-04 18:18:14.746 INFO 1 --- [utionAction-597] s.c.k.v.c.a.KubernetesCacheDataConverter : gke_dl-platform_us-central1-b_platform-labv2/KubernetesNamespaceCachingAgent[1/1]: grouping applications has 10 entries and 11 relationships
I 2018-09-04 18:18:14.746 INFO 1 --- [utionAction-597] s.c.k.v.c.a.KubernetesCacheDataConverter : gke_dl-platform_us-central1-b_platform-labv2/KubernetesNamespaceCachingAgent[1/1]: grouping namespace has 11 entries and 11 relationships
I 2018-09-04 18:18:14.742 INFO 1 --- [utionAction-605] .k.v.p.a.KubernetesConfigMapCachingAgent : Caching 9 configmaps in spinnaker/KubernetesConfigMapCachingAgent[1/1]
I 2018-09-04 18:18:14.740 INFO 1 --- [utionAction-606] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesDeploymentCachingAgent[1/1] completed in 481.174s
I 2018-09-04 18:18:14.740 INFO 1 --- [utionAction-517] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesSecurityGroupCachingAgent[1/1] completed in 28.275s
I 2018-09-04 18:18:12.714 INFO 1 --- [utionAction-585] c.k.v.p.a.KubernetesInstanceCachingAgent : Describing items in spinnaker/KubernetesInstanceCachingAgent[1/1]
I 2018-09-04 18:18:12.710 INFO 1 --- [utionAction-605] .k.v.p.a.KubernetesConfigMapCachingAgent : Describing items in spinnaker/KubernetesConfigMapCachingAgent[1/1]
I 2018-09-04 18:18:11.652 INFO 1 --- [utionAction-606] k.v.p.a.KubernetesDeploymentCachingAgent : Caching 9 deployments in spinnaker/KubernetesDeploymentCachingAgent[1/1]
I 2018-09-04 18:18:11.651 INFO 1 --- [utionAction-517] .p.a.KubernetesSecurityGroupCachingAgent : Caching 0 security groups in spinnaker/KubernetesSecurityGroupCachingAgent[1/1]
I 2018-09-04 18:18:11.648 INFO 1 --- [utionAction-517] .p.a.KubernetesSecurityGroupCachingAgent : Describing items in spinnaker/KubernetesSecurityGroupCachingAgent[1/1]
I 2018-09-04 18:18:11.647 INFO 1 --- [utionAction-607] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesSecretCachingAgent[1/1] completed in 229.913s
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:18:07.851 WARN 1 --- [utionAction-609] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.docker.registry.provider.DockerRegistryProvider:spinnaker-usgcr-account/DockerRegistryImageCachingAgent[1/1] completed with one or more failures
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:18:07.841 WARN 1 --- [utionAction-593] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesControllersCachingAgent[1/1] completed with one or more failures
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:18:07.841 WARN 1 --- [utionAction-603] c.n.s.c.cache.LoggingInstrumentation : kubernetes:gke_dl-platform_us-central1-b_platform-labv2/KubernetesCoreCachingAgent[1/1] completed with one or more failures
I 2018-09-04 18:18:07.841 INFO 1 --- [utionAction-607] s.c.k.v.p.a.KubernetesSecretCachingAgent : Caching 0 secrets in spinnaker/KubernetesSecretCachingAgent[1/1]
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:17:48.155 WARN 1 --- [utionAction-602] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesLoadBalancerCachingAgent[1/1] completed with one or more failures
I
I java.lang.NullPointerException: null
I
I 2018-09-04 18:17:47.323 WARN 1 --- [utionAction-604] c.n.s.c.cache.LoggingInstrumentation : kubernetes:gke_dl-platform_us-central1-b_platform-labv2/KubernetesUnregisteredCustomResourceCachingAgent[1/1] completed with one or more failures
I 2018-09-04 18:17:46.464 INFO 1 --- [utionAction-597] c.n.s.c.k.v.c.a.KubernetesV2CachingAgent : gke_dl-platform_us-central1-b_platform-labv2/KubernetesNamespaceCachingAgent[1/1] is starting
I 2018-09-04 18:17:46.463 INFO 1 --- [utionAction-607] s.c.k.v.p.a.KubernetesSecretCachingAgent : Describing items in spinnaker/KubernetesSecretCachingAgent[1/1]
I
I java.lang.NullPointerException: null
at com.netflix.spinnaker.clouddriver.kubernetes.v1.security.KubernetesV1Credentials.getDeclaredNamespaces(KubernetesV1Credentials.java:140) ~[clouddriver-kubernetes-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at com.netflix.spinnaker.clouddriver.kubernetes.caching.KubernetesCachingAgent.reloadNamespaces(KubernetesCachingAgent.java:64) [clouddriver-kubernetes-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor1071.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210) [groovy-all-2.4.13.jar:2.4.13]
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59) [groovy-all-2.4.13.jar:2.4.13]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:52) [groovy-all-2.4.13.jar:2.4.13]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:154) [groovy-all-2.4.13.jar:2.4.13]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:158) [groovy-all-2.4.13.jar:2.4.13]
at com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.agent.KubernetesSecretCachingAgent.loadData(KubernetesSecretCachingAgent.groovy:56) [clouddriver-kubernetes-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at com.netflix.spinnaker.cats.agent.CachingAgent$CacheExecution.executeAgentWithoutStore(CachingAgent.java:80) [cats-core-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at com.netflix.spinnaker.cats.agent.CachingAgent$CacheExecution.executeAgent(CachingAgent.java:73) [cats-core-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler$AgentExecutionAction.execute(ClusteredAgentScheduler.java:299) [cats-redis-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler$AgentJob.run(ClusteredAgentScheduler.java:273) [cats-redis-2.67.0-SNAPSHOT.jar:2.67.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_171]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
I
I 2018-09-04 18:17:37.833 WARN 1 --- [utionAction-607] c.n.s.c.k.v.s.KubernetesV1Credentials : Could not determine kubernetes namespaces. Will try again later.
E Exception in thread "ContainerBackgroundProcessor[StandardEngine[Tomcat]]" java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:17:18.508 ERROR 1 --- [Engine[Tomcat]]] org.apache.catalina.core.ContainerBase : Unexpected death of background thread [ContainerBackgroundProcessor[StandardEngine[Tomcat]]]
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:17:18.508 ERROR 1 --- [igurationSource] c.n.config.AbstractPollingScheduler : Error getting result from polling source
E Exception in thread "Exec Default Executor" java.lang.OutOfMemoryError: GC overhead limit exceeded
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:16:42.597 ERROR 1 --- [igurationSource] c.n.config.AbstractPollingScheduler : Error getting result from polling source
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: Java heap space
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: GC overhead limit exceeded
I 2018-09-04 18:16:01.792 INFO 1 --- [utionAction-605] .k.v.p.a.KubernetesConfigMapCachingAgent : Loading config maps in spinnaker/KubernetesConfigMapCachingAgent[1/1]
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:15:52.187 WARN 1 --- [utionAction-598] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesServiceAccountCachingAgent[1/1] completed with one or more failures
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:15:50.724 WARN 1 --- [utionAction-517] c.n.s.c.cache.LoggingInstrumentation : com.netflix.spinnaker.clouddriver.kubernetes.v1.provider.KubernetesV1Provider:spinnaker/KubernetesSecurityGroupCachingAgent[1/1] completed with one or more failures
I
I java.lang.NullPointerException: null
I
I 2018-09-04 18:15:48.454 WARN 1 --- [utionAction-589] c.n.s.c.cache.LoggingInstrumentation : kubernetes:gke_dl-platform_us-central1-b_platformv2/KubernetesUnregisteredCustomResourceCachingAgent[1/1] completed with one or more failures
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: Java heap space
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: GC overhead limit exceeded
E Exception in thread "Exec Stream Pumper" java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
I
I 2018-09-04 18:15:18.347 ERROR 1 --- [gentScheduler-1] c.n.s.c.r.c.ClusteredAgentScheduler : Unable to run agents
I
I java.lang.OutOfMemoryError: GC overhead limit exceeded
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (8 by maintainers)
I just did some profiling to look for improvements. The primary issue I found was that each caching cycle had a very large number of allocations, putting a lot of pressure on the garbage collector to keep up. Most of these allocations are coming from MergeCacheData as it builds up a single list of all relationships for each item in the cache. (Some optimization to reduce allocations in that function was already done in spinnaker/clouddriver#2863.)
Looking more broadly at how we’re building up cache relationships, the core issue is that the algorithm is
O(N)in allocations andO(N^2)in allocated memory, whereNis the number of pods in a given application, with the bottleneck being creating the application -> pod relationships for the application.Merging relationships in
mergeCacheData(linked above) isO(1)in allocations andO(m + n)in allocated memory, wheremandnare the sizes of each of the relationship collections to merge. (We merge by creating a new hashset and then adding each of the two relationship collections.)The issue is that invertRelationships creates one
CacheDataper relationship with a single item in itsrelationshipscollection. So we’re building up the collection of all pod relationships in the application’sCacheDataby callingmergeCacheDataNtimes (once for each of theNpods).I’m working on a PR that would change
invertRelationshipsto generate only a singleCacheDatafor each item, already containing a full collection of its relationships, so we would only need to merge it in once per application rather than once per pod.Also see #4321 linked above, which is a recent example of the same thing. I plan on digging a bit into this in the next couple of weeks.