spinnaker: Unable to read manifest 'ServiceEntry.networking.istio.io egress' in namespace for account (spinnaker account) ; orca unable to get status from clouddriver

Spinnaker : 1.13.0

Issue Summary:

Custom Resource is a kind supported by spinnaker. Adding customResource to hal config allows manifest deploy in kubernetes v2 account to proceed, however, throws exception in Orca when getting status from clouddriver.

Cloud Provider(s):

kubernetes V2

Environment:

Nonprod

Feature Area (if this issue is UI/UX related, please tag @spinnaker/ui-ux-team):

Orca / Clouddriver

Description:

kubectl apply works with the manifest. Pipeline with one stage to deploy the manifest successfully executes deploy task, fails in waiting for manifest to stabilize status update stage with exception after 30min. Actual deployment does go through. “Kubectl get serviceentry” also returns the correct information.

Steps to Reproduce:

Under spinnaker accounts, I have the following :

spinnaker accounts I have the following permissions :

- name: web-djin-productivity-virginia-nonprod-main
      environment: nonprod
      requiredGroupMembership: []
      providerVersion: V2
      permissions:
        READ:
        - djin-productivity
        WRITE:
        - djin-productivity
      dockerRegistries: []
      context: web-djin-productivity-virginia-nonprod-main
      configureImagePullSecrets: true
      cacheThreads: 1
      namespaces:
      - djin-productivity
      omitNamespaces: []
      kinds: []
      omitKinds: []
      customResources:
      - kubernetesKind: VirtualService.networking.istio.io
        versioned: false
      - kubernetesKind: Gateway.networking.istio.io
        versioned: false
      - kubernetesKind: ServiceEntry.networking.istio.io
        versioned: false
      cachingPolicies: []
      kubeconfigFile: /home/spinnaker/.kube/spinnaker_k8s_config/web-djin-productivity-virginia-nonprod-main-conf
      liveManifestCalls: true
      oauthScopes: []
      oAuthScopes: []
      onlySpinnakerManaged: false

I create serviceentry (istio) and get the following from clouddriver cache logs :

Apr  1 01:58:09 spinnaker clouddriver[4786]: 2019-04-01 01:58:09.332  INFO 4786 --- [utionAction-161] s.c.k.v.c.a.KubernetesCacheDataConverter : web-djin-productivity-virginia-nonprod-main/CustomKubernetes(ServiceEntry.networking.istio.io)[1/1]: grouping artifact has 2 entries and 2 relationships
Apr  1 01:58:09 spinnaker clouddriver[4786]: 2019-04-01 01:58:09.332  INFO 4786 --- [utionAction-161] s.c.k.v.c.a.KubernetesCacheDataConverter : web-djin-productivity-virginia-nonprod-main/CustomKubernetes(ServiceEntry.networking.istio.io)[1/1]: grouping applications has 1 entries and 2 relationships
Apr  1 01:58:09 spinnaker clouddriver[4786]: 2019-04-01 01:58:09.332  INFO 4786 --- [utionAction-161] s.c.k.v.c.a.KubernetesCacheDataConverter : web-djin-productivity-virginia-nonprod-main/CustomKubernetes(ServiceEntry.networking.istio.io)[1/1]: grouping ServiceEntry.networking.istio.io has 2 entries and 2 relationships

the role with rbac for the account has all permissions :

the role for this accounts ( RBAC) has necessary permissions too :

- apiGroups:
  - config.istio.io
  - networking.istio.io
  - rbac.istio.io
  - authentication.istio.io
  resources:
  - '*'
  verbs:
  - '*'

and Deploy the manifest using v2 account for Kubernetes. WaitForManifestStage task of Stage Deploy (manifest) timed out after 30min.

Additional Details:

Orca logs show the following exception. No errors in Clouddriver logs

Apr  1 17:43:12 spinnaker orca[4766]: 2019-04-01 17:43:12.656  INFO 4766 --- [    handlers-19] c.n.s.orca.clouddriver.OortService       : [Kuber.Kaul@dowjones.com] ---> HTTP GET http://localhost:7002/manifests/web-djin-productivity-virginia-nonprod-main/djin-productivity/ServiceEntry.networking.istio.io%20egress
Apr  1 17:43:32 spinnaker orca[4766]: 2019-04-01 17:43:32.657  INFO 4766 --- [    handlers-19] c.n.s.orca.clouddriver.OortService       : [Kuber.Kaul@dowjones.com] ---- ERROR http://localhost:7002/manifests/web-djin-productivity-virginia-nonprod-main/djin-productivity/ServiceEntry.networking.istio.io%20egress
Apr  1 17:43:32 spinnaker orca[4766]: 2019-04-01 17:43:32.658  INFO 4766 --- [    handlers-19] c.n.s.orca.clouddriver.OortService       : [Kuber.Kaul@dowjones.com] java.net.SocketTimeoutException: timeout
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.Okio$4.newTimeoutException(Okio.java:232)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.retrofit.RetrofitConfiguration$1.intercept(RetrofitConfiguration.groovy:59)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.okhttp.OkHttp3MetricsInterceptor.intercept(OkHttp3MetricsInterceptor.java:43)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.RealCall.execute(RealCall.java:92)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.jakewharton.retrofit.Ok3Client.execute(Ok3Client.java:41)
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:326)
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.sun.proxy.$Proxy95.getManifest(Unknown Source)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.clouddriver.DelegatingOortService.getManifest(DelegatingOortService.java:49)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.WaitForManifestStableTask.execute(WaitForManifestStableTask.java:83)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:100)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLoggingContext(RunTaskHandler.kt:316)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.access$withLoggingContext(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:99)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$sam$java_util_concurrent_Callable$0.call(AuthenticationAware.kt)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$propagate$0(AuthenticatedRequest.java:97)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$DefaultImpls.withAuth(AuthenticationAware.kt:50)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withAuth(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:98)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:182)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:49)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:33)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:59)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:33)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withExecution(OrcaMessageHandler.kt:68)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withExecution(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withStage(OrcaMessageHandler.kt:55)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withStage(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withTask(OrcaMessageHandler.kt:42)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:175)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:69)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.q.MessageHandler$DefaultImpls.invoke(MessageHandler.kt:36)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.invoke(OrcaMessageHandler.kt)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.audit.ExecutionTrackingMessageHandlerPostProcessor$ExecutionTrackingMessageHandlerProxy.invoke(ExecutionTrackingMessageHandlerPostProcessor.kt:47)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.q.QueueProcessor$pollOnce$1$1.run(QueueProcessor.kt:82)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.lang.Thread.run(Thread.java:748)
Apr  1 17:43:32 spinnaker orca[4766]: Caused by: java.net.SocketException: Socket closed
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.net.SocketInputStream.read(SocketInputStream.java:204)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.net.SocketInputStream.read(SocketInputStream.java:141)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.Okio$2.read(Okio.java:140)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
Apr  1 17:43:32 spinnaker orca[4766]: #011... 66 more
Apr  1 17:43:32 spinnaker orca[4766]: 2019-04-01 17:43:32.658  INFO 4766 --- [    handlers-19] c.n.s.orca.clouddriver.OortService       : [Kuber.Kaul@dowjones.com] ---- END ERROR
Apr  1 17:43:32 spinnaker orca[4766]: 2019-04-01 17:43:32.658  WARN 4766 --- [    handlers-19] c.n.s.o.c.t.m.WaitForManifestStableTask  : [Kuber.Kaul@dowjones.com] Unable to read manifest 'ServiceEntry.networking.istio.io egress' in 'djin-productivity' for account web-djin-productivity-virginia-nonprod-main
Apr  1 17:43:32 spinnaker orca[4766]: retrofit.RetrofitError: timeout
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RetrofitError.networkError(RetrofitError.java:27)
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:395)
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.sun.proxy.$Proxy95.getManifest(Unknown Source)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.clouddriver.DelegatingOortService.getManifest(DelegatingOortService.java:49)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.WaitForManifestStableTask.execute(WaitForManifestStableTask.java:83)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:100)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLoggingContext(RunTaskHandler.kt:316)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.access$withLoggingContext(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:99)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$sam$java_util_concurrent_Callable$0.call(AuthenticationAware.kt)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$propagate$0(AuthenticatedRequest.java:97)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$DefaultImpls.withAuth(AuthenticationAware.kt:50)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withAuth(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:98)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:182)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:49)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:33)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:59)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:33)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withExecution(OrcaMessageHandler.kt:68)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withExecution(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withStage(OrcaMessageHandler.kt:55)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withStage(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withTask(OrcaMessageHandler.kt:42)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:175)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:69)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.q.MessageHandler$DefaultImpls.invoke(MessageHandler.kt:36)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.invoke(OrcaMessageHandler.kt)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:56)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.q.audit.ExecutionTrackingMessageHandlerPostProcessor$ExecutionTrackingMessageHandlerProxy.invoke(ExecutionTrackingMessageHandlerPostProcessor.kt:47)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.q.QueueProcessor$pollOnce$1$1.run(QueueProcessor.kt:82)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.lang.Thread.run(Thread.java:748)
Apr  1 17:43:32 spinnaker orca[4766]: Caused by: java.net.SocketTimeoutException: timeout
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.Okio$4.newTimeoutException(Okio.java:232)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.orca.retrofit.RetrofitConfiguration$1.intercept(RetrofitConfiguration.groovy:59)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.netflix.spinnaker.okhttp.OkHttp3MetricsInterceptor.intercept(OkHttp3MetricsInterceptor.java:43)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okhttp3.RealCall.execute(RealCall.java:92)
Apr  1 17:43:32 spinnaker orca[4766]: #011at com.jakewharton.retrofit.Ok3Client.execute(Ok3Client.java:41)
Apr  1 17:43:32 spinnaker orca[4766]: #011at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:326)
Apr  1 17:43:32 spinnaker orca[4766]: #011... 39 common frames omitted
Apr  1 17:43:32 spinnaker orca[4766]: Caused by: java.net.SocketException: Socket closed
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.net.SocketInputStream.read(SocketInputStream.java:204)
Apr  1 17:43:32 spinnaker orca[4766]: #011at java.net.SocketInputStream.read(SocketInputStream.java:141)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.Okio$2.read(Okio.java:140)
Apr  1 17:43:32 spinnaker orca[4766]: #011at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
Apr  1 17:43:32 spinnaker orca[4766]: #011... 66 common frames omitted

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 33 (13 by maintainers)

Most upvoted comments

I am too facing this issue of long response from clouddriver-caching for CRD with full names which is making the Spinnaker pipeline fail due to timeout even though the deployment is successful. image

Spinnaker: 1.24.4 AWS EKS: 1.18 clouddriver-caching: us-docker.pkg.dev/spinnaker-community/docker/clouddriver:7.2.2-20210125200730 orca: us-docker.pkg.dev/spinnaker-community/docker/orca:2.18.1-20210128030017

We are running Spinnaker in ~HA. image

orca-local.yml configuration

okHttpClient:
  connectTimeoutMs: 120000
  readTimeoutMs: 120000

ORCA Logs

2021-03-15 07:56:10.825  WARN 1 --- [    handlers-16] c.n.s.o.c.t.m.WaitForManifestStableTask  : [myuser@myorg.com] Unable to read manifest 'Output.logging.banzaicloud.io <OBJECT>' in '<NAMESPACE>' for account <ACCOUNT>
retrofit.RetrofitError: timeout
	at retrofit.RetrofitError.networkError(RetrofitError.java:27)
	at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:395)
	at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240)
	at com.sun.proxy.$Proxy151.getManifest(Unknown Source)
	at com.netflix.spinnaker.orca.clouddriver.DelegatingOortService.getManifest(DelegatingOortService.java:49)
	at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.WaitForManifestStableTask.execute(WaitForManifestStableTask.java:89)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$1$1.invoke(RunTaskHandler.kt:143)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$1$1.invoke(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLoggingContext(RunTaskHandler.kt:416)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.access$withLoggingContext(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$1.invoke(RunTaskHandler.kt:105)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$1.invoke(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$sam$java_util_concurrent_Callable$0.call(AuthenticationAware.kt)
	at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$wrapCallableForPrincipal$0(AuthenticatedRequest.java:272)
	at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$DefaultImpls.withAuth(AuthenticationAware.kt:53)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withAuth(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:104)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:244)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:68)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:46)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:85)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:46)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withExecution(OrcaMessageHandler.kt:95)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withExecution(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withStage(OrcaMessageHandler.kt:74)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withStage(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withTask(OrcaMessageHandler.kt:60)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:233)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:101)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.q.MessageHandler$DefaultImpls.invoke(MessageHandler.kt:36)
	at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.invoke(OrcaMessageHandler.kt)
	at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:75)
	at com.netflix.spinnaker.orca.q.audit.ExecutionTrackingMessageHandlerPostProcessor$ExecutionTrackingMessageHandlerProxy.invoke(ExecutionTrackingMessageHandlerPostProcessor.kt:72)
	at com.netflix.spinnaker.q.QueueProcessor$callback$1$1.run(QueueProcessor.kt:90)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: timeout
	at okio.Okio$4.newTimeoutException(Okio.java:232)
	at okio.AsyncTimeout.exit(AsyncTimeout.java:286)
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:358)
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:230)
	at okhttp3.internal.http1.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.java:242)
	at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:213)
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.java:115)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:94)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at brave.okhttp3.TracingInterceptor.intercept(TracingInterceptor.java:74)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at com.netflix.spinnaker.okhttp.MetricsInterceptor.doIntercept(MetricsInterceptor.java:74)
	at com.netflix.spinnaker.okhttp.OkHttp3MetricsInterceptor.intercept(OkHttp3MetricsInterceptor.java:36)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
	at okhttp3.RealCall.execute(RealCall.java:81)
	at com.jakewharton.retrofit.Ok3Client.execute(Ok3Client.java:41)
	at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:326)
	... 39 common frames omitted
Caused by: java.net.SocketException: Socket closed
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:183)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
	at okio.Okio$2.read(Okio.java:140)
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
	... 68 common frames omitted
# When queried with short name of CRD
time curl -H "X-SPINNAKER-USER: myuser@myorg.com" http://spin-clouddriver-caching.spinnaker:7002/manifests/<ACCOUNT>/<NAMESPACE>/Output%20<OBJECT-NAME>/
real	0m0.098s
user	0m0.003s
sys	0m0.000s

# When queried with the full name of CRD the ORCA gets timeout because of a long response from clouddriver
time curl -H "X-SPINNAKER-USER: myuser@myorg.com" http://spin-clouddriver-caching.spinnaker:7002/manifests/<ACCOUNT>/<NAMESPACE>/Output.logging.banzaicloud.io%20<OBJECT-NAME>/
real	5m34.941s
user	0m0.008s
sys	0m0.003s

After adding the CRD in customResource of the <ACCOUNT> under the halyard configuration the query with the full name of CRD has become drastically faster but still slower compared to the short name.

      - name: ACCOUNT                                                                                                        
        requiredGroupMembership: []                                                                                                                   
        permissions:                                                                                                                                  
          READ:                                                                                                                                       
          - <TEAM>-dev                                                                                                                                   
          - <TEAM>-sre                                                                                                                                   
          WRITE:                                                                                                                                      
          - <TEAM>-dev                                                                                                                                   
          - <TEAM>-sre                                                                                                                                   
        dockerRegistries: []                                                                                                                          
        providerVersion: V2                                                                                                                           
        context: <KUBERNETES-CONTEXT>                                                                             
        configureImagePullSecrets: true                                                                                                               
        cacheThreads: 1                                                                                                                               
        namespaces:                                                                                                                                   
        - <NAMESPACE>                                                                                                                        
        omitNamespaces: []                                                                                                                            
        kinds: []                                                                                                                                     
        omitKinds: []                                                                                                                                 
        customResources:                                                                                                                              
        - kubernetesKind: Output.logging.banzaicloud.io                                                                                               
          versioned: false                                                                                                                            
        - kubernetesKind: Flow.logging.banzaicloud.io                                                                                                 
          versioned: false                                                                                                                            
        cachingPolicies: []                                                                                                                           
        kubeconfigFile: /home/spinnaker/.kube/ACCOUNT-kubeconfig                                                             
        oAuthScopes: []                                                                                                                               
        onlySpinnakerManaged: false

Time taken to:

# When queried with the short name of CRD the response is almost as fast as it was previous
time curl -H "X-SPINNAKER-USER: myuser@myorg.com" http://spin-clouddriver-caching.spinnaker:7002/manifests/<ACCOUNT>/<NAMESPACE>/Output%20<OBJECT-NAME>/
real	0m0.011s
user	0m0.003s
sys	0m0.000s

# When queried with full name of CRD
time curl -H "X-SPINNAKER-USER: myuser@myorg.com" http://spin-clouddriver-caching.spinnaker:7002/manifests/<ACCOUNT>/<NAMESPACE>/Output.logging.banzaicloud.io%20<OBJECT-NAME>/
real	0m13.576s
user	0m0.008s
sys	0m0.003s

One thing I am not sure about is why suddenly this is happening. We were using various CRD for a while now and the deployments were happening fine and the Spinnaker pipelines were also showing success.

Also, sometimes the timeout happens for some native Kubernetes objects like Secret in various namespaces.

2021-03-15 07:47:04.461  WARN 1 --- [    handlers-16] c.n.s.o.c.t.m.WaitForManifestStableTask  : [myuser@myorg.com Unable to read manifest 'deployment <OBJECTX>' in '<NAMESPACE-X>' for account <ACCOUNT-X>
2021-03-15 08:05:37.417  WARN 1 --- [    handlers-19] c.n.s.o.c.t.m.WaitForManifestStableTask  : [myuser@myorg.com] Unable to read manifest 'secret <OBJECT>-v000' in '<NAMESPACE-Y>' for account <ACCOUNT>

I also noticed that whenever I am adding customResources in Halyard configuration and doing a hal deploy apply the clouddriver-ro, clouddriver-ro-deck, and clouddriver-rw pods are getting re-created not the clouddriver-caching pods. image

@castlemilk A break down of how we’re running spinnaker and what it is targetting:

  • deployed on kubernetes using halyard
  • originally deployed spinnaker v1.9 in 2018, been patching the same instance since then, but stuck on 1.16 because every attempt to upgrade above that has caused issues
  • deployed in HA mode for clouddriver & echo (have tried running individual copies of each clouddriver, and multiple copies, was never able to tell from the documentation or the spinnaker slack which is the correct approach, but needer seems notably better than the other)
  • provisioned reasonably beefy resources for the clouddrivers (~16GB/3CPU for clouddriver-caching, less for the others)
  • targeting seven k8s clusters, distributed globally
  • all k8s clusters (including the one the spinnaker runs on) are deployed to AWS using Kops
  • each cluster has less than ten namespaces
  • each namespace might have hundreds of microservices and associated deployments, services, configmaps & secrets
  • we are heavy users of CRDs, with dozens of custom resources defined on each cluster, and hundreds if not thousands of individual CRs deployed
  • we deploy tens of pipelines per day that will run through a dozen or so namespaces across multiple clusters

@ezimanyi could you say if in the running clouddriver in HA mode, it should also be run with multiple replicas of each sub-service? e.g. two replicas of clouddriver-caching, two replicas of clouddriver-rw, etc? And also, is HA mode compatible with running with a MySQL backend instead of redis? The docs here, https://www.spinnaker.io/reference/halyard/high-availability/#ha-clouddriver, only refer to pointing clouddriver-ro & clouddriver-ro-deck to a redis read replica. Should equivalent steps be taken if using a MySQL backend?

My sense is that some of the issue that I’m seeing is related to multiple processes trashing each other’s caches, so I’m inclined to disable HA mode as an experiment.

we are experiencing similar issue with spinnaker 1.17: Pipelines get stuck whenever a CRD is involved, but pipelines that only use out of the box resources work fine. The errors surface in Orca, and are indicative of a timeout:

INFO 1 --- [    handlers-18] c.n.s.orca.clouddriver.OortService       : [username@org.com] ---- ERROR http://spin-clouddriver.csp-spinnaker:7002/manifests/fabric-services-sit/fabric-services-sit/SealedSecret.bitnami.com%20database-credentials?includeEvents=false

What we see when exec’d onto the clouddriver pod is that calls to the target cluster for sealedSecret.bitnami.com take over 30 seconds but calls to sealedSecret take under a second. The kubectl version on clouddriver is 1.16.0

We see the same difference in response times when running curl commands from the hal pod

curl -H "X-SPINNAKER-USER: username@org.com" http://spin-clouddriver.csp-spinnaker:7002/manifests/fabric-services-sit/fabric-services-sit/SealedSecret.bitnami.com%20database-credentials
response time > 30 seconds

curl -H "X-SPINNAKER-USER: username@org.com" http://spin-clouddriver.csp-spinnaker:7002/manifests/fabric-services-sit/fabric-services-sit/SealedSecret%20database-credentials
response time < 1 second

We have switched off the liveManifestCalls but now get faced with the problem of the force cache refresh taking 12 minutes which is making every pipeline take an unacceptably long time.

Thanks again. Looking at the history of the dockerfiles, it looks like it’s been pinned to kubectl version 1.16.x since spinnaker release 1.15.x. Prior to that, it looks like it was pulling whatever was the latest stable release of kubectl, which would have caused some confusion on new patches, I imagine!

My targets k8s clusters are all on 1.15.7 and 1.16.7, so I don’t think that version incompatibility is a part of the issue we are seeing. I’ll let you know when I know more.

@blaffoy : There’s no nicely-formatted compatibility matrix; the best I can suggest is looking at the commit history of the Dockerfile to see when the version of kubectl was bumped (though I guess that would also be in the release notes for the corresponding version). As I noted above, I’d like to just stop using kubectl entirely and move to the client library which should improve the story around this.

I just tested this and actually see a slightly slower call to kubectl when asking it for a fully qualified name instead of a short name.

kubectl get CronTab.stable.example.com is ~3x slower than kubectl get CronTab. (In my case only 0.2s vs 0.07s) I dug a bit into it and it looks like the fully-qualified call always invalidates the discovery information cache:

discovery.go:214] Invalidating discovery information

I’m guessing that for a larger cluster, regenerating the discovery information cache could be more expensive and maybe take ~30s. I have no idea why one version of the call would invalidate the cache and the other wouldn’t but that seems like it may be the root cause of this issue.