kaniko: "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137
Actual behavior I am running a build on Cloud build. The build succeeds, but the caching snapshot at the end fails with the following messages:
Step #0: INFO[0154] Taking snapshot of full filesystem…
Finished Step #0 ERROR ERROR: build step 0 “gcr.io/kaniko-project/executor:latest” failed: step exited with non-zero status: 137
Expected behavior I would like the whole build to succeed - including caching.
To Reproduce Steps to reproduce the behavior:
- Build on GCP Cloud Build using a cloudbuild.yaml with Kaniko caching enabled.
Additional Information
I cannot provide the Dockerfile, but it is based on continuumio/miniconda3 and also installs tensorflow in a conda environment. I think it started failing after tensorflow was added to the list of dependencies.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 16
- Comments: 17
Commits related to this issue
- Try kaniko v1.3.0 following https://github.com/GoogleContainerTools/kaniko/issues/1669#issuecomment-1207305515 — committed to GSS-Cogs/dd-cms by ajtucker 2 years ago
- Disable cache compression Disable cache compression to allow large images, like images depending on `tensorflow` or `torch`. For more information, see: https://github.com/GoogleContainerTools/kani... — committed to davidcavazos/beam by davidcavazos a year ago
If you add
--compressed-caching=falseit works for me on 1.9.0--compressed-caching=falseworked well for most things except forCOPY <src> <dst>and it turns out theres also--cache-copy-layers. I was still getting crushed bypytorchinstallations.This is the
cloudbuild.yamlthat works really well nowAny news on this? Still happening on v1.9.0
any update for this issue ?, i am facing same problem when deploy ML image with sentence-transformers and torch>=1.6.0. the image size is more than 3 GB.
Looks like it worked but I tried with cache disabled. On 1.6 even with cache disabled it was stopping. So good sign
Any update on this topic? I have this issue on every ML related dockerfile where we need to use pytorch and other libs.