pipelines: [backend] caching fails when task pods have `WorkflowTaskResults` RBAC (argo 3.4 change)
Issue
When the ServiceAccount
used by argo-workflows
task Pods has the RBAC to create WorkflowTaskResults
resources, argo-workflows will change how it stores task outputs and stores them in these CRDs (previously patched the Pod to store them in annotations).
Kubeflow Pipelines does not know about the WorkflowTaskResults
CRD, and fails in a strange way, specifically, workflows will successfully run the FIRST time they are run (when there is not a cache), but every following time they are run, all tasks will fail with the message This step is in Error state with this message: unexpected end of JSON input
.
Here is an issue from someone with this problem: https://github.com/kubeflow/pipelines/issues/8842
I expect we will see MANY more of this issue soon, as argo-workflows
3.4+ uses WorkflowTaskResults
by default, and people will try and update from our packaged 3.3 version.
Screenshot:
Impacted by this bug? Give it a 👍.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 21
- Comments: 15 (11 by maintainers)
This issue is still present, and will prevent Kubeflow from being used with Argo Workflows 3.4+.
@chensun even if we don’t upgrade our Argo to 3.4, we really need to make Kubeflow Pipelines aware of
WorkflowTaskResults
, because as described in https://github.com/kubeflow/pipelines/issues/8942#issuecomment-1550381700, if theServiceAccount/default-editor
has permission to createWorkflowTaskResults
, it WILL use them instead of theworkflows.argoproj.io/outputs
Pod annotation which Kubeflow expects.