pipeline: Pipelines with finally clause in them makes tekton-pipelines-webhook spam conversion errors between v1alpha1 and v1beta1

Expected Behavior

Pipelines with finally: in them should not make the tekton-piplines-webhook spam convert errors.

Actual Behavior

When applying any Pipeline that includes finally: in it, the tekton-pipelines-webhook starts spamming these conversion errors:

{
  "level": "info",
  "logger": "webhook",
  "caller": "webhook/conversion.go:42",
  "msg": "Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000714280), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"2019\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc00084d7c0), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:2019, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"100.109.0.3:60412\", RequestURI:\"/resource-conversion?timeout=30s\", TLS:(*tls.ConnectionState)(0xc000788370), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00084d800)}",
  "commit": "a162a1d"
}
{
  "level": "info",
  "logger": "webhook",
  "caller": "conversion/conversion.go:133",
  "msg": "Converting [kind=Pipeline group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1",
  "commit": "a162a1d",
  "uid": "b2b3d9a8-e22b-4724-8105-61ccabaa9bf5",
  "desiredAPIVersion": "tekton.dev/v1alpha1",
  "inputType": "[kind=Pipeline group=tekton.dev version=v1beta1]",
  "outputType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
  "hubType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
  "knative.dev/key": "tekton-pipelines/clone-cleanup-workspace"
}
{
  "level": "error",
  "logger": "webhook",
  "caller": "conversion/conversion.go:59",
  "msg": "Conversion failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1",
  "commit": "a162a1d",
  "uid": "b2b3d9a8-e22b-4724-8105-61ccabaa9bf5",
  "desiredAPIVersion": "tekton.dev/v1alpha1",
  "stacktrace": "github.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion/conversion.go:59\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.conversionHandler.func1\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/conversion.go:61\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2012\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2387\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/webhook.go:259\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2807\nnet/http.(*conn).serve\n\tnet/http/server.go:1895"
}

Steps to Reproduce the Problem

  1. Create a new minikube cluster
  2. Apply the latest tekton-pipelines release: 0.16.0
  3. Apply any Pipeline with a finally:clause in it. I used the example you have available here: tekton-pipelinerun-with-final-task
  4. Check the logs of tekton-pipelines-webhook pod and observe the errors above, spamming forever.
  5. Remove the YAML applied in step 3 and the log spamming stops.

Additional Info

Have also tested this on our AKS cluster in Azure, same results as in minikube. Versions tested that showed same behavior: 0.15.0, 0.15.1, 0.15.2 and 0.16.0

  • Kubernetes version:

    Output of kubectl version:

$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

$ tkn version
Client version: 0.11.0
Pipeline version: v0.16.0
Triggers version: unknown

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 72 (48 by maintainers)

Most upvoted comments

The kube api server is repeatedly attempting to ListAndWatch v1alpha1 versions. Here are the logs from my api server in kind:

E0211 16:12:20.440641       1 watcher.go:318] failed to prepare current and previous objects: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline
 failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not availa$
le in v1alpha1                                                                                                                                       
W0211 16:12:20.440728       1 reflector.go:436] storage/cacher.go:/tekton.dev/pipelines: watch of tekton.dev/v1alpha1, Kind=Pipeline ended with: Inte$
nal error occurred: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline gr$
up=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1                                                           
E0211 16:12:21.443946       1 cacher.go:419] cacher (*unstructured.Unstructured): unexpected ListAndWatch error: failed to list tekton.dev/v1alpha1, $
ind=Pipeline: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=te$
ton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1; reinitializing...                                              
E0211 16:12:22.447025       1 cacher.go:419] cacher (*unstructured.Unstructured): unexpected ListAndWatch error: failed to list tekton.dev/v1alpha1, $
ind=Pipeline: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=te$
ton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1; reinitializing...                                              
E0211 16:12:23.450160       1 cacher.go:419] cacher (*unstructured.Unstructured): unexpected ListAndWatch error: failed to list tekton.dev/v1alpha1, $
ind=Pipeline: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=te$
ton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1; reinitializing...

@vdemeester @pritidesai should we maybe not return a conversion error when moving down to v1alpha1? It doesn’t seem to be what the k8s api server is expecting / gracefully handling?

Edit to add: We could simply discard the finally section when down-converting to v1alpha1?

Edit to add: Here’s the kubernetes version I’m running in my kind cluster:

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

This happened in a brand new kind cluster on a brand new VM. It definitely looks like either we shouldn’t be returning the error at all or we’re somehow violating expectations that kubernetes has for the way conversions work.

I am not able to reproduce this even on 0.16.3 now disappointed I am running pipelines on my local Kubernetes cluster. I tried resetting my cluster sob but didn’t help.

How often these conversion errors are reported in the webhook logs?

All the time non-stop spam 😄. You will not miss it if you hit this issue.

@pritidesai We are using Tekton on OpenShift as part of the Red Hat OpenShift Pipeline Operator. The lastest version of this operator is using: Tekton Pipelines: v0.16.3. For us it’s not possible to upgrade to the latest available Tekton Pipeline version as we are using the Red Hat implementation of Tekton.

@Yannig this would remove support for v1alpha1 though (as it wouldn’t be served anymore…). I don’t see any reason why it would be magically fixed in 0.19 😓

thanks @Yannig appreciate the workaround, but wouldn’t work for folks using v1alpha1 resources along with v1beta1.

@vdemeester any updates on this one? will this be magically fixed in 0.19? 🤞

Also, there is a warning to avoid conversion failures, not sure if it applies to the way we have implemented conversions.

Nice find @pritidesai ! I think that warning is enough evidence to me that we should simply be removing Finally from the v1alpha1 resources we convert down to. In a way this does make sense - anyone relying on v1alpha1 should have no expectation that their Finally entries would be available since the feature simply didn’t exist then.

It also sounds like this is only an issue because kubernetes is caching the CRD apiVersions we list in our CRDs. So at some point we may remove v1alpha1 from our CRDs and at that time we should expect that kubernetes will no longer be hitting our conversion endpoints for that version.

Having written all that it seems like https://github.com/tektoncd/pipeline/pull/3757 is the right way forward. I’ll update the tests and get that PR into mergeable shape. Thanks a lot for finding all that info!

Ok… it´s not tekton-pipelines-controller. I just stopped the deployment and still see the conversions. In retrospect, I should have checked this earlier 🤦‍♂️

I still don´t know what´s causing this in my case tough…

This seems like a really important point. @ljupchokotev , @wouter2397 , @coryrc - are you any of you able to reproduce this? If you stop the pipelines controller deployment do you continue to see the error messages?

At this point I’m not quite sure how to debug further if we’re not able to nail down some clearer steps to reproduce. If the above is true re: the deployment then it seems possible that the error messages are actually being generated from something that isn’t related to the pipelines controller? It would be great to get some feedback confirming whether this is the case for other folks.

All the time non-stop spam 😄. You will not miss it if you hit this issue.

😭

Ok… it´s not tekton-pipelines-controller. I just stopped the deployment and still see the conversions. In retrospect, I should have checked this earlier 🤦‍♂️

I still don´t know what´s causing this in my case tough…

We were facing this with v0.19.0 and I just updated one cluster to v0.20.1 and still see this in the webhook log

@wouter2397 I don’t have any solution yet. I am adding this in next milestone to make sure we fix this. I will resume troubleshooting next week.

cc @dibyom @khrm is triggers still using v1alpha1 API ? 🤔

The pipelines v1alpha1 API? Users can create resources with that APIVersion but they don’t have to. As far as I know, there shouldn’t be a direct dependency on the v1alpha1 API (we use a dynamic client).

In order to correct the problem, I made the modification suggested by @freefood89. If it helps anyone, it’s a matter of making the following change in the tekton manifest:

diff --git a/tekton/original-tekton-v0.18.1.yaml b/tekton/original-tekton-v0.18.1.yaml
index 23b7e02..f1c6fd1 100644
--- a/tekton/original-tekton-v0.18.1.yaml
+++ b/tekton/original-tekton-v0.18.1.yaml
@@ -632,10 +632,9 @@ spec:
   group: tekton.dev
   preserveUnknownFields: false
   versions:
-    - &version
-      name: v1alpha1
+    - name: v1beta1
       served: true
-      storage: false
+      storage: true
       # Opt into the status subresource so metadata.generation
       # starts to increment
       subresources:
@@ -651,9 +650,6 @@ spec:
           # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/
           # See issue: https://github.com/knative/serving/issues/912
           x-kubernetes-preserve-unknown-fields: true
-    - !!merge <<: *version
-      name: v1beta1
-      storage: true
   names:
     kind: Pipeline
     plural: pipelines

Well, I don’t know if you know it but the issue is still there with 0.18.1. I get the same kind of spam over and over until I remove the finally statement on my pipeline.

@pritidesai because it’s two commands, it make sense that those “age” are different. The would also be different if you ask twice the same resource version 😉

@freefood89 🤗 Yeah I need to dig deeper into this. I have yet to understand why the webhook or controller would go on his own to ask a v1alpha1 or a conversion… I might miss something obvious in code…

nope, I didnt explicitly request that particular pipeline, just listing v1alpha1 pipelines with kubectl get pipelines.v1alpha1.tekton.dev shows that beta pipeline converted into alpha as well. Unless, the listing command is implicitly converting pipelines which sounds weird.

Yeah, that’s what should happens, when you list pipelines.v1alpha1.tekton.dev, it will get whatever is stored and convert it “on-the-fly” in a v1alpha1 version if it can.

The “age” difference in the listing feels weird though…

So whenever you get resources from k8s the api is called with the resource:

GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME

I’m guessing that’s where the desiredVersion comes from (???) I was able to replicate it with kubectl get pipelines.v1alpha1.tekton.dev where k8s will call the conversion webhook to convert my v1beta1 pipelines to v1alpha1

(I’m learning this as I go too haha)

@pritidesai I learned that actually v1beta1 in the CRD is whatever v1alpha1 is with:

name: v1beta1
storage: true 

as overrides. Apparently that’s what &version with <<: *version does. Didn’t know yaml had these capabilities

resulting in:

versions: 
- name: v1alpha1 
  served: true 
  storage: false 
  # Opt into the status subresource so metadata.generation 
  # starts to increment 
  subresources: 
    status: {} 
  schema: 
    openAPIV3Schema: 
      type: object 
      # One can use x-kubernetes-preserve-unknown-fields: true 
      # at the root of the schema (and inside any properties, additionalProperties) 
      # to get the traditional CRD behaviour that nothing is pruned, despite 
      # setting spec.preserveUnknownProperties: false. 
      # 
      # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/ 
      # See issue: https://github.com/knative/serving/issues/912 
      x-kubernetes-preserve-unknown-fields: true 
- name: v1beta1 
  storage: true 
  served: true 
  # Opt into the status subresource so metadata.generation 
  # starts to increment 
  subresources: 
    status: {} 
  schema: 
    openAPIV3Schema: 
      type: object 
      # One can use x-kubernetes-preserve-unknown-fields: true 
      # at the root of the schema (and inside any properties, additionalProperties) 
      # to get the traditional CRD behaviour that nothing is pruned, despite 
      # setting spec.preserveUnknownProperties: false. 
      # 
      # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/ 
      # See issue: https://github.com/knative/serving/issues/912 
      x-kubernetes-preserve-unknown-fields: true 

@coryrc so prow “might” query the tekton api of this cluster ?

Yes, unfortunately I can’t rule that out because I am not sufficiently knowledgeable in this domain.

Yeah, if it’s on a brand new minikube without any api customer (no UI, …) then it might be an internal problem (something from the controller using v1alpha1 calls still…)

Sometimes I think back to 5-pages of Java stacktrace fondly…