spinnaker: Orca parses Kubernetes yaml and logs it (& fails pipeline)

Issue Summary:

Similar to #5585 , Orca is logging some of a Kubernetes ConfigMap manifest during a Deploy Manifest stage.

I have three hypotheses:

  1. SPeL execution somewhere as some of these ConfigMap’s are in json (not base64 encoded) and have strings like ${prometheus_url:raw} and similar $.... SPeL execution is disabled in the pipeline stage though. Slack thread talking about SPeL evaluation issues Related PR? @marchello2000 : https://github.com/spinnaker/orca/pull/3286

  2. Aggregate size of all manifest artifacts. This is because the pipeline rarely fails when it fans to 2 deploy stages, but tends to always fail at fanning to 4+ deploy stages. All cluster manifests are pretty much the same.

  3. Oversized YAML manifest (~3.6MB) as suggested in this slack thread as Tomcat maxPostSize default 2MB.

Cloud Provider(s):

Kubernetes

Environment:

  • Spinnaker on Kubernetes backed by Aurora MySQL (Orca/Clouddriver/Front50) deploying to Kubernetes
  • We use account restrictions via Fiat and a service account for triggers to use.
  • Pipelines triggered by Webhook

Feature Area:

Orca & Clouddriver-ro ?

Description:

Steps to Reproduce:

  1. Upload large yamls (~4MB) to S3 (we have 7)
  2. Webhook Trigger spinnaker with S3 artifact locations
  3. Fan out to multiple Deploy Stages
  4. Observe error.

Additional Details:

We could try increasing Tomcat maxPostSize but not sure which service exactly, for example:

      orca-local.yml: |-
        server:
          tomcat:
            maxPostSize: 0

Logs:


| 2020-03-27 07:41:04.976 | 2020-03-27 07:41:04.976  INFO 1 --- [pool-2-thread-1] c.n.s.o.s.t.SqlActiveExecutionsMonitor   : [] Refreshing active execution gauges (active: 0)                                                                                                                                                                                                                        |
| 2020-03-27 07:42:01.303 | 2020-03-27 07:42:01.303  INFO 1 --- [    scheduler-5] com.netflix.spinnaker.q.QueueProcessor   : [] Received message StartExecution(executionType=pipeline, executionId=01E4DF1KAV3C5YAZ3XYNMB136F, application=prometheus)                                                                                                                                               |
| 2020-03-27 07:42:01.356 | 2020-03-27 07:42:01.356  INFO 1 --- [      events-16] c.n.s.orca.front50.Front50Service        : [spinnaker-service-account@example.com] ---> HTTP GET http://spin-front50.spinnaker:8080/notifications/application/prometheus                                                                                                                                              |
| 2020-03-27 07:42:01.368 | 2020-03-27 07:42:01.368  INFO 1 --- [    scheduler-5] com.netflix.spinnaker.q.QueueProcessor   : [] Received message StartStage(executionType=pipeline, executionId=01E4DF1KAV3C5YAZ3XYNMB136F, application=prometheus, stageId=01E4DF1KAW677HGGNPVQWVFQVZ)                                                                                                               |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$propagate$0(AuthenticatedRequest.java:92)                                                                                                                                                                                                                                                                  |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.events.ExecutionListenerAdapter.onApplicationEvent(ExecutionListenerAdapter.java:47)                                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.events.ExecutionListenerAdapter.onExecutionStarted(ExecutionListenerAdapter.java:57)                                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.echo.spring.EchoNotifyingExecutionListener.beforeExecution(EchoNotifyingExecutionListener.groovy:61)                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.echo.spring.EchoNotifyingExecutionListener.addApplicationNotifications(EchoNotifyingExecutionListener.groovy:129)                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.396 | 2020-03-27 07:42:01.396  WARN 1 --- [      events-16] c.n.s.okhttp.OkHttp3MetricsInterceptor   : [spinnaker-service-account@example.com] Request GET:http://spin-front50.spinnaker:8080/notifications/application/prometheus is missing [X-SPINNAKER-ACCOUNTS] authentication headers and will be treated as anonymous.                                                     |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.echo.spring.EchoNotifyingExecutionListener$_addApplicationNotifications_closure4.call(EchoNotifyingExecutionListener.groovy)                                                                                                                                                                                                               |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.echo.spring.EchoNotifyingExecutionListener$_addApplicationNotifications_closure4.doCall(EchoNotifyingExecutionListener.groovy:128)                                                                                                                                                                                                         |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.okhttp.OkHttp3MetricsInterceptor.intercept(OkHttp3MetricsInterceptor.java:33)                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.396 | Request from: com.netflix.spinnaker.okhttp.MetricsInterceptor.doIntercept(MetricsInterceptor.java:95)                                                                                                                                                                                                                                                                     |
| 2020-03-27 07:42:01.396 |  at com.netflix.spinnaker.orca.events.ExecutionListenerAdapter.onApplicationEvent(ExecutionListenerAdapter.java:30)                                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:01.397 | 2020-03-27 07:42:01.397  INFO 1 --- [      events-16] c.n.s.orca.front50.Front50Service        : [spinnaker-service-account@example.com] <--- HTTP 200 http://spin-front50.spinnaker:8080/notifications/application/prometheus (40ms)                                                                                                                                       |
| 2020-03-27 07:42:01.419 | 2020-03-27 07:42:01.419  INFO 1 --- [      events-16] c.n.spinnaker.orca.echo.EchoService      : [] ---> HTTP POST http://spin-echo.spinnaker:8089/                                                                                                                                                                                                                       |
| 2020-03-27 07:42:01.444 | 2020-03-27 07:42:01.444  INFO 1 --- [      events-19] c.n.spinnaker.orca.echo.EchoService      : [spinnaker-service-account@example.com] ---> HTTP POST http://spin-echo.spinnaker:8089/                                                                                                                                                                                    |
| 2020-03-27 07:42:01.444 | 2020-03-27 07:42:01.444  INFO 1 --- [      events-16] c.n.spinnaker.orca.echo.EchoService      : [] <--- HTTP 200 http://spin-echo.spinnaker:8089/ (24ms)                                                                                                                                                                                                                 |
| 2020-03-27 07:42:01.454 | 2020-03-27 07:42:01.454  INFO 1 --- [      events-19] c.n.spinnaker.orca.echo.EchoService      : [spinnaker-service-account@example.com] <--- HTTP 200 http://spin-echo.spinnaker:8089/ (9ms)                                                                                                                                                                               |
| 2020-03-27 07:42:01.550 | 2020-03-27 07:42:01.550  INFO 1 --- [    scheduler-5] com.netflix.spinnaker.q.QueueProcessor   : [] Received message RunTask(executionType=pipeline, executionId=01E4DF1KAV3C5YAZ3XYNMB136F, application=prometheus, stageId=01E4DF1KAW677HGGNPVQWVFQVZ, taskId=1, taskType=class com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ResolveDeploySourceManifestTask)  |
| 2020-03-27 07:42:01.581 | 2020-03-27 07:42:01.581  INFO 1 --- [     handlers-6] c.n.s.orca.clouddriver.OortService       : [spinnaker-service-account@example.com] ---> HTTP PUT http://spin-clouddriver-ro.spinnaker:7002/artifacts/fetch/                                                                                                                                                           |
| 2020-03-27 07:42:01.774 | 2020-03-27 07:42:01.774  INFO 1 --- [     handlers-6] c.n.s.orca.clouddriver.OortService       : [spinnaker-service-account@example.com] <--- HTTP 200 http://spin-clouddriver-ro.spinnaker:7002/artifacts/fetch/ (193ms)                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.q.QueueProcessor$callback$1$1.run(QueueProcessor.kt:89)                                                                                                                                                                                                                                                                                         |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.audit.ExecutionTrackingMessageHandlerPostProcessor$ExecutionTrackingMessageHandlerProxy.invoke(ExecutionTrackingMessageHandlerPostProcessor.kt:69)                                                                                                                                                                                       |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.invoke(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.invoke(OrcaMessageHandler.kt)                                                                                                                                                                                                                                                                    |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.q.MessageHandler$DefaultImpls.invoke(MessageHandler.kt:36)                                                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.handle(RunTaskHandler.kt:88)                                                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:202)                                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withTask(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                                    |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withTask(OrcaMessageHandler.kt:60)                                                                                                                                                                                                                                                               |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withStage(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withStage(OrcaMessageHandler.kt:74)                                                                                                                                                                                                                                                              |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withExecution(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                               |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$DefaultImpls.withExecution(OrcaMessageHandler.kt:95)                                                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:46)                                                                                                                                                                                                                                                                  |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withStage$1.invoke(OrcaMessageHandler.kt:85)                                                                                                                                                                                                                                                                  |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:46)                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.OrcaMessageHandler$withTask$1.invoke(OrcaMessageHandler.kt:68)                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                           |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$withTask$1.invoke(RunTaskHandler.kt:209)                                                                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1.invoke(RunTaskHandler.kt:127)                                                                                                                                                                                                                                                                            |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withAuth(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                                    |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$DefaultImpls.withAuth(AuthenticationAware.kt:51)                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$propagate$0(AuthenticatedRequest.java:92)                                                                                                                                                                                                                                                                  |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.AuthenticationAware$sam$java_util_concurrent_Callable$0.call(AuthenticationAware.kt)                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                           |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2.invoke(RunTaskHandler.kt:128)                                                                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.access$withLoggingContext(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler.withLoggingContext(RunTaskHandler.kt:379)                                                                                                                                                                                                                                                                         |
| 2020-03-27 07:42:01.774 | 2020-03-27 07:42:01.774  WARN 1 --- [     handlers-6] c.n.s.okhttp.OkHttp3MetricsInterceptor   : [spinnaker-service-account@example.com] Request PUT:http://spin-clouddriver-ro.spinnaker:7002/artifacts/fetch/ is missing [X-SPINNAKER-ACCOUNTS] authentication headers and will be treated as anonymous.                                                                  |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:130)                                                                                                                                                                                                                                                                        |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ResolveDeploySourceManifestTask.execute(ResolveDeploySourceManifestTask.java:45)                                                                                                                                                                                                                                |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ManifestEvaluator.evaluate(ManifestEvaluator.java:102)                                                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ManifestEvaluator.getManifests(ManifestEvaluator.java:111)                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ManifestEvaluator.getManifestsFromArtifact(ManifestEvaluator.java:130)                                                                                                                                                                                                                                          |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.kork.core.RetrySupport.retry(RetrySupport.java:34)                                                                                                                                                                                                                                                                                              |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.tasks.manifest.ManifestEvaluator.lambda$fetchAndParseManifestYaml$2(ManifestEvaluator.java:160)                                                                                                                                                                                                                                |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.clouddriver.DelegatingOortService.fetchArtifact(DelegatingOortService.java:128)                                                                                                                                                                                                                                                            |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.okhttp.OkHttp3MetricsInterceptor.intercept(OkHttp3MetricsInterceptor.java:33)                                                                                                                                                                                                                                                                   |
| 2020-03-27 07:42:01.774 | Request from: com.netflix.spinnaker.okhttp.MetricsInterceptor.doIntercept(MetricsInterceptor.java:95)                                                                                                                                                                                                                                                                     |
| 2020-03-27 07:42:01.774 |  at com.netflix.spinnaker.orca.q.handler.RunTaskHandler$handle$1$2$1.invoke(RunTaskHandler.kt:73)                                                                                                                                                                                                                                                                         |
| 2020-03-27 07:42:04.979 | 2020-03-27 07:42:04.979  INFO 1 --- [pool-2-thread-1] c.n.s.o.s.t.SqlActiveExecutionsMonitor   : [] Refreshing active execution gauges (active: 1)                                                                                                                                                                                                                        |
| 2020-03-27 07:42:33.573 | 2020-03-27 07:42:33.573  INFO 1 --- [    scheduler-3] com.netflix.spinnaker.q.QueueProcessor   : [] Received message RunTask(executionType=pipeline, executionId=01E4DF1KAV3C5YAZ3XYNMB136F, application=prometheus, stageId=01E4DF1KAW677HGGNPVQWVFQVZ, taskId=3, taskType=class com.netflix.spinnaker.orca.clouddriver.tasks.MonitorKatoTask)                           |
| 2020-03-27 07:42:33.717 | 2020-03-27 07:42:33.717  INFO 1 --- [     handlers-7] c.n.s.k.expressions.ExpressionTransform  : [spinnaker-service-account@example.com] Failed to evaluate {                                                                                                                                                                                                               |
| 2020-03-27 07:42:33.718 |     "gnetId": 5450,                                                                                                                                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:33.718 |     "editable": true,                                                                                                                                                                                                                                                                                                                                                     |
| 2020-03-27 07:42:33.718 |     "description": "Alerts from prometheus.",                                                                                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:33.718 |     },                                                                                                                                                                                                                                                                                                                                                                    |
| 2020-03-27 07:42:33.718 |         ]                                                                                                                                                                                                                                                                                                                                                                 |
| 2020-03-27 07:42:33.718 |             }                                                                                                                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:33.718 |                 "type": "dashboard"                                                                                                                                                                                                                                                                                                                                       |
| 2020-03-27 07:42:33.718 |                 "name": "Annotations & Alerts",                                                                                                                                                                                                                                                                                                                           |
| 2020-03-27 07:42:33.718 |                 "iconColor": "rgba(0, 211, 255, 1)",                                                                                                                                                                                                                                                                                                                      |
| 2020-03-27 07:42:33.718 |                 "hide": true,                                                                                                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:33.718 |                 "enable": true,                                                                                                                                                                                                                                                                                                                                           |
| 2020-03-27 07:42:33.718 |                 "datasource": "-- Grafana --",                                                                                                                                                                                                                                                                                                                            |
| 2020-03-27 07:42:33.718 |                 "builtIn": 1,                                                                                                                                                                                                                                                                                                                                             |
| 2020-03-27 07:42:33.718 |             {                                                                                                                                                                                                                                                                                                                                                             |

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 45 (16 by maintainers)

Most upvoted comments

We’re starting to scope out #5676, and the tentative plan there would involve no longer storing the full Kubernetes YAML in the pipeline context. This should address the two related issues here: (1) When Orca SpEL-evaluates the context, it will no longer evaluate the manifest, which what is leading to warnings for any non-escaped $. There will still be an option to do a single pass of SpEL evaluation on the manifest before deploying, if desired, but we’ll eliminate the extraneous evaluations (most of which happen after the manifest was deployed anyway, and which can’t be disabled). (2) This will greatly reduce the size of the pipeline context for pipelines that deploy a lot of manifests, which should avoid issues loading the executions tab for these, as well as request-size limits trying to save the execution context.

Obviously there are a lot of details to be worked out around exactly how that will work, but that’s the current plan for how to resolve this. I’ll leave this issue open at least for now, as it’s worth users being able to find this issue and see the plan for resolution, but eventually will close it as a duplicate of the above-mentioned issue which is where we’ll actually be fixing this.

After some additional tests I saw my previous workaround only works the first time the manifests are deployed. If I try to deploy an upgrade of the application or even redeploy the same YAML, it always fails because the deployment monitor tries to evaluate the current configmap already applied (whose values were evaluated in the first deploy).

So, right now this is critical to be fixed soon to us. I cannot use it to deploy our applications in my organization before this is solved. We need some help, please 🙏

BTW, we are using Spinnaker 1.20.5.

Hi all - this has popped up for us once a week or so for a while. As a short term solution could we make the difference between the error and spel warnings more clear in the UI? For example this error shows when a deployment never becomes healthy:


Exception ( Wait For Manifest To Stabilize ) ‘deployment thing-canary’ in ‘mynamespace’ for account my-cluster: Deployment exceeded its progress deadline. Failed to evaluate [kubectl.kubernetes.io/last-applied-… etc


The “Failed to evaluate…” is a configmap with an environment variable like MY_ENV=“$MY_ENV2”. That is a warning around spel, it creates an error message over 100 lines long (its a large configmap).

Maybe some headers and color coding to show the difference between the error/warnings would help:

Exception ( Wait For Manifest To Stabilize ) ‘deployment thing-canary’ in ‘mynamespace’ for account my-cluster: Deployment exceeded its progress deadline.

Warnings: Failed to evaluate [kubectl.kubernetes.io/last-applied-… etc


Thoughts?

Thanks very much Eric.

Can confirm that the SPeL portion does appear to be warnings (and logged manifests) only.

Have added another cluster (and wall of yaml to context), which resulted in hitting max_allowed_packet limit in Aurora again. Have bumped that and will monitor for failures.

Will post any new log findings as arise and follow the linked issue. Thanks again.

The part about the pipeline failing is surprising, and I suspect it’s not related to the SpEL issue and instead is (as you suggest) related to the amount of data we need to send back and forth between the microservices exceeding request sizes. It looks like the error in your log above is just the SpEL error, which should be a warning—is there anything else in any of the logs (particularly orca or clouddriver) that might indicate why the pipeline is failing.

If it is indeed the amount of data in the pipeline context, then the same issue (#5676) will fix that as well as we will stop putting entire manifests in the context. But hopefully with more information on the error we can at least find a workaround (such as increasing maxPostSize) for the meantime, as it will still be a release or two until that issue gets fully solved.