pipelines: [backend] caching fails when task pods have `WorkflowTaskResults` RBAC (argo 3.4 change)

Issue

When the ServiceAccount used by argo-workflows task Pods has the RBAC to create WorkflowTaskResults resources, argo-workflows will change how it stores task outputs and stores them in these CRDs (previously patched the Pod to store them in annotations).

Kubeflow Pipelines does not know about the WorkflowTaskResults CRD, and fails in a strange way, specifically, workflows will successfully run the FIRST time they are run (when there is not a cache), but every following time they are run, all tasks will fail with the message This step is in Error state with this message: unexpected end of JSON input.

Here is an issue from someone with this problem: https://github.com/kubeflow/pipelines/issues/8842

I expect we will see MANY more of this issue soon, as argo-workflows 3.4+ uses WorkflowTaskResults by default, and people will try and update from our packaged 3.3 version.

Screenshot:

Screenshot 2023-03-07 at 17 26 23


Impacted by this bug? Give it a 👍.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 21
  • Comments: 15 (11 by maintainers)

Most upvoted comments

This issue is still present, and will prevent Kubeflow from being used with Argo Workflows 3.4+.

@chensun even if we don’t upgrade our Argo to 3.4, we really need to make Kubeflow Pipelines aware of WorkflowTaskResults, because as described in https://github.com/kubeflow/pipelines/issues/8942#issuecomment-1550381700, if the ServiceAccount/default-editor has permission to create WorkflowTaskResults, it WILL use them instead of the workflows.argoproj.io/outputs Pod annotation which Kubeflow expects.