pipelines: Offical artifact passing example fails with PNS executor
What steps did you take:
I am using the offical example https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml that runs fine out of the box with the argo docker executor.
Then i changed the executor to pns
apiVersion: v1
data:
config: |
{
namespace: kubeflow,
containerRuntimeExecutor: pns,
executorImage: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance,
...
What happened:
Every pipeline that passes outputs (including the offical example) is now failing. The problem seems to be that the main container exits properly and the wait container cannot chroot into it anymore:
"executor error: could not chroot into main for artifact collection: container may have exited too quickly"
The docker executor works around this by abusing docker.sock to copy the outputs from the terminated main container which is obviously completely infeasible in production.
The funny thing is that you can manually mount an emptydir under /tmp/outputs and add the proper output path (e.g. tmp/outputs/numbers/data) to op.output_artifact_paths.
def add_emptydir(op):
from kubernetes import client as k8s_client
op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='tmp/outputs'))
op.output_artifact_paths={
'mlpipeline-ui-metadata': 'tmp/outputs/mlpipeline-ui-metadata.json',
'mlpipeline-metrics': 'tmp/outputs/mlpipeline-metrics.json',
'extract-as-artifact': 'tmp/outputs/numbers/data',
}
return op
Then the output file (tmp/outputs/numbers/data) is successfully extracted via the mirrored mounts functionality, but extracting the same file with chroot fails.
What did you expect to happen:
I expect PNS to extract the output successfully
Environment:
i just tried kubeflow pipelines on a Kubernetes 1.17 (Azure) and 1.18 (minikube) cluster with docker as container engine.
How did you deploy Kubeflow Pipelines (KFP)?
Download and extract https://github.com/kubeflow/pipelines/archive/1.0.0.zip Install with kubectl apply -k ‘/home/julius/Schreibtisch/kubeflow/pipelines-1.0.0/manifests/kustomize/cluster-scoped-resources’ kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io kubectl apply -k ‘/home/julius/Schreibtisch/kubeflow/pipelines-1.0.0/manifests/kustomize/env/dev’
KFP version: I am using the 1.0.0 release https://github.com/kubeflow/pipelines/releases/tag/1.0.0.
KFP SDK version:
[julius@julius-asus ~]$ pip list | grep kfp
kfp 1.0.0
kfp-server-api 1.0.0
Anything else you would like to add:
I also experimented with op.file_outputs without success.
I also experimented with emptydir and the k8sapi executor without success.
I tried newer argo workflow and exec images (2.8.3 and 2.9.3 in deployment/workflow-controller) without success.
So i am wondering why pns is working for others.
Next to the offical examples I am also using some very simple pipelines
@func_to_container_op
def write_numbers_1(numbers_path: OutputPath(str), start: int = 0, count: int = 10):
import time, datetime
time.sleep(30) # should not be necessary with newer versions of argo
'''Write numbers to file'''
print('numbers_path:', numbers_path)
with open(numbers_path, 'w') as writer:
for i in range(start, count):
writer.write(str(i) + '\n')
print('finished', datetime.datetime.now())
which work perfectly fine with the docker executor and fail miserably with pns.
See also https://github.com/kubeflow/pipelines/issues/1654#issuecomment-661997829 #1654
/kind bug
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (13 by maintainers)
This is my current status
The only cluster not working is on azure. I will have to check back with the maintainer of the azure cluster and report back here. Btw. the azure cluster is a bit outdated compared to k3s and minikube
“Humm by chance did you have any psp in the previous deployment? I noticed that psp without SYS_PTRACE throw that error” You will get an error on pod startup if SYS_Ptrace is not enabled. This is the PSP that works for minikube and pns
Maybe it helps to add CAP_SYS_CHROOT ? Maye then allowPrivilegeEscalation: true becomes unnecessary
“Can you give more details about the infeasibility?” Well hostpath and docker.sock access is a security issue. You cannot expect anyone to manage your cluster with that security hole.