pipelines: [backend] caching fails when task pods have `WorkflowTaskResults` RBAC (argo 3.4 change)
Issue
When the ServiceAccount used by argo-workflows task Pods has the RBAC to create WorkflowTaskResults resources, argo-workflows will change how it stores task outputs and stores them in these CRDs (previously patched the Pod to store them in annotations).
Kubeflow Pipelines does not know about the WorkflowTaskResults CRD, and fails in a strange way, specifically, workflows will successfully run the FIRST time they are run (when there is not a cache), but every following time they are run, all tasks will fail with the message This step is in Error state with this message: unexpected end of JSON input.
Here is an issue from someone with this problem: https://github.com/kubeflow/pipelines/issues/8842
I expect we will see MANY more of this issue soon, as argo-workflows 3.4+ uses WorkflowTaskResults by default, and people will try and update from our packaged 3.3 version.
Screenshot:

Impacted by this bug? Give it a 👍.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 21
- Comments: 15 (11 by maintainers)
This issue is still present, and will prevent Kubeflow from being used with Argo Workflows 3.4+.
@chensun even if we don’t upgrade our Argo to 3.4, we really need to make Kubeflow Pipelines aware of
WorkflowTaskResults, because as described in https://github.com/kubeflow/pipelines/issues/8942#issuecomment-1550381700, if theServiceAccount/default-editorhas permission to createWorkflowTaskResults, it WILL use them instead of theworkflows.argoproj.io/outputsPod annotation which Kubeflow expects.