pipelines: failed to save outputs: Error response from daemon: No such container
While trying to setup my own kubeflow pipeline I ran into a problem when one step is finished and the outputs should be saved. After finishing the step kubeflow always throws an error with the message This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: <container-id>
First I thought I would have made a mistake with my pipeline, but it’s the same with the preexisting examples pipeline, e.g. for “[Sample] Basic - Conditional execution” I get this message after the first step (flip-coin) is finished.
The main container shows following output:
heads
So it seems to have run successfully.
The wait container shows following output:
time="2019-06-07T11:41:35Z" level=info msg="Creating a docker executor"
time="2019-06-07T11:41:35Z" level=info msg="Executor (version: v2.2.0, build_date: 2018-08-30T08:52:54Z) initialized with template:\narchiveLocation:\n s3:\n accessKeySecret:\n key: accesskey\n name: mlpipeline-minio-artifact\n bucket: mlpipeline\n endpoint: minio-service.kubeflow:9000\n insecure: true\n key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666\n secretKeySecret:\n key: secretkey\n name: mlpipeline-minio-artifact\ncontainer:\n args:\n - python -c \"import random; result = 'heads' if random.randint(0,1) == 0 else 'tails';\n print(result)\" | tee /tmp/output\n command:\n - sh\n - -c\n image: python:alpine3.6\n name: \"\"\n resources: {}\ninputs: {}\nmetadata: {}\nname: flip-coin\noutputs:\n artifacts:\n - name: mlpipeline-ui-metadata\n path: /mlpipeline-ui-metadata.json\n - name: mlpipeline-metrics\n path: /mlpipeline-metrics.json\n parameters:\n - name: flip-coin-output\n valueFrom:\n path: /tmp/output\n"
time="2019-06-07T11:41:35Z" level=info msg="Waiting on main container"
time="2019-06-07T11:41:36Z" level=info msg="main container started with container ID: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting annotations monitor"
time="2019-06-07T11:41:36Z" level=info msg="docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting deadline monitor"
time="2019-06-07T11:41:37Z" level=error msg="`docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c` failed: Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c\n"
time="2019-06-07T11:41:37Z" level=info msg="Main container completed"
time="2019-06-07T11:41:37Z" level=info msg="No sidecars"
time="2019-06-07T11:41:37Z" level=info msg="Saving output artifacts"
time="2019-06-07T11:41:37Z" level=info msg="Annotations monitor stopped"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-ui-metadata"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json - | gzip > /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-ui-metadata.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-metrics"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json to /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json - | gzip > /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-metrics.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-metrics.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving output parameters"
time="2019-06-07T11:41:37Z" level=info msg="Saving path output parameter: flip-coin-output"
time="2019-06-07T11:41:37Z" level=info msg="[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]"
time="2019-06-07T11:41:37Z" level=error msg="`[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]` stderr:\nError: No such container:path: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output\ntar: This does not look like a tar archive\ntar: Exiting with failure status due to previous errors\n"
time="2019-06-07T11:41:37Z" level=info msg="Alloc=4338 TotalAlloc=11911 Sys=10598 NumGC=4 Goroutines=11"
time="2019-06-07T11:41:37Z" level=fatal msg="exit status 2\ngithub.com/argoproj/argo/errors.Wrap\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:87\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:70\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).GetFileContents\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:40\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/executor.go:343\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:49\ngithub.com/argoproj/argo/cmd/argoexec/commands.glob..func4\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:19\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:198\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361"
So it seems that there is a problem with either kubeflow or my docker daemon. The output of kubectl describe pods for the created pod is following:
Name: conditional-execution-pipeline-vmdhx-2104306666
Namespace: kubeflow
Priority: 0
PriorityClassName: <none>
Node: root-nuc8i5beh/9.233.5.90
Start Time: Fri, 07 Jun 2019 13:41:29 +0200
Labels: workflows.argoproj.io/completed=true
workflows.argoproj.io/workflow=conditional-execution-pipeline-vmdhx
Annotations: workflows.argoproj.io/node-message:
Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
workflows.argoproj.io/node-name: conditional-execution-pipeline-vmdhx.flip-coin
workflows.argoproj.io/template:
{"name":"flip-coin","inputs":{},"outputs":{"parameters":[{"name":"flip-coin-output","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"na...
Status: Failed
IP: 10.1.1.30
Controlled By: Workflow/conditional-execution-pipeline-vmdhx
Containers:
main:
Container ID: containerd://7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
Image: python:alpine3.6
Image ID: docker.io/library/python@sha256:766a961bf699491995cc29e20958ef11fd63741ff41dcc70ec34355b39d52971
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
python -c "import random; result = 'heads' if random.randint(0,1) == 0 else 'tails'; print(result)" | tee /tmp/output
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 07 Jun 2019 13:41:35 +0200
Finished: Fri, 07 Jun 2019 13:41:35 +0200
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
wait:
Container ID: containerd://f0449dc70c0a651c09aeb883edda9ce0ec5e415fa15a5468fe5b360fb06637c2
Image: argoproj/argoexec:v2.2.0
Image ID: docker.io/argoproj/argoexec@sha256:eea81e0b0d8899a0b7f9815c9c7bd89afa73ab32e5238430de82342b3bb7674a
Port: <none>
Host Port: <none>
Command:
argoexec
Args:
wait
State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 07 Jun 2019 13:41:35 +0200
Finished: Fri, 07 Jun 2019 13:41:37 +0200
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: conditional-execution-pipeline-vmdhx-2104306666 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/var/lib/docker from docker-lib (ro)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-lib:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker
HostPathType: Directory
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
pipeline-runner-token-xh2p7:
Type: Secret (a volume populated by a Secret)
SecretName: pipeline-runner-token-xh2p7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m1s default-scheduler Successfully assigned kubeflow/conditional-execution-pipeline-vmdhx-2104306666 to root-nuc8i5beh
Normal Pulling 8m1s kubelet, root-nuc8i5beh Pulling image "python:alpine3.6"
Normal Pulled 7m56s kubelet, root-nuc8i5beh Successfully pulled image "python:alpine3.6"
Normal Created 7m56s kubelet, root-nuc8i5beh Created container main
Normal Started 7m55s kubelet, root-nuc8i5beh Started container main
Normal Pulled 7m55s kubelet, root-nuc8i5beh Container image "argoproj/argoexec:v2.2.0" already present on machine
Normal Created 7m55s kubelet, root-nuc8i5beh Created container wait
Normal Started 7m55s kubelet, root-nuc8i5beh Started container wait
So probably there is a problem with the argoexec container image? I see it tries to mount /var/run/docker.sock. When I try to read this file with cat I get a “No such device or address” even though I can see the file with ls /var/run. When I try to open it with vi it mentions that the Permissions were denied, so I cannot see inside of the file. Is this the usual behavior with this file or does it seem like there are any problems with it?
I would really appreciate any help I can get! Thank you guys!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 22 (4 by maintainers)
I know this is several months old but FWIW, with microk8s v1.15.3 and kubeflow v0.6 , I solved this issue by changing the kubelet container-runtime from remote to docker by editing
/var/snap/microk8s/current/args/kubelet:On GCP, if you are using AI Platform Pipelines are having this issue then you need to change your kubernetes deployment and change the image type from containerd to docker. This worked fine for me. My k8s version was 1.19 and kubeflow version was 1.4.1.
Sorry to re-open the issue.
I am currently in the process of deploying a tensorflow extended pipeline (v1 release candidate) on KFP 1.14 via the Google Cloud Platform marketplace.
Unfortunately, I am running into the same issue.
Can someone elaborate on how to tackle this in Kubeflow Pipelines on GCP?
Much appreciated!
FYI, I met the same issue. But I switched to
pnsbecause I faced the following error when I triedk8sapi.When I tried to use
pnsusing environment variable as in the previous comment, it failed again with the following error.process namespace sharing is not enabled on podSo I just set it directly like following and it worked.
See also https://github.com/kubeflow/pipelines/issues/1654 which contains interesting discussion on executors.
I’ve encountered the same problem on AI Platform Pipelines of GCP as well.
The component process looked like being completed, but an error occurred during “wait” process.
The below is logging detail.
I would really appreciate some help. Thank you.
pnsis the key , it work for me , 3QSee the GKE release notes for 1.19.9 - they move away from docker runtime
In our case, we were trying to run argo workflows which uses the docker container runtime by default. We upgraded a cluster to kubernetes 1.19.9 - which changes the default runtime to containerd - and suddenly none of our workflows would start, with our “wait” containers also complaining that they could not find containers. The solution for us was to explicitly tell argo workflows to use a different container runtime from docker (we switched to k8sapi). See the helm chart containerRuntimeExecutor and the possible argo-workflow executor environment variables.
I’m finding that I don’t have
/var/snap/microk8s/current/docker.sockor/var/snap/microk8s/common/var/lib/docker.I have noticed that when I begin a new run, a new snapshot is created under containerd with a
docker.sockand alib/docker.Finding docker.sock
sudo find /var/snap/microk8s -name "docker.sock"returns…/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2733/fs/run/docker.sock /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2730/fs/run/docker.sock /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2727/fs/run/docker.sockFinding lib/docker
sudo find /var/snap/microk8s -name "docker" -type d | grep "lib/docker"returns…/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2733/fs/var/lib/docker /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2730/fs/var/lib/docker /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2727/fs/var/lib/docker