kaniko: "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 137

Actual behavior I am running a build on Cloud build. The build succeeds, but the caching snapshot at the end fails with the following messages:

Step #0: INFO[0154] Taking snapshot of full filesystem…
Finished Step #0 ERROR ERROR: build step 0 “gcr.io/kaniko-project/executor:latest” failed: step exited with non-zero status: 137

Expected behavior I would like the whole build to succeed - including caching.

To Reproduce Steps to reproduce the behavior:

Build on GCP Cloud Build using a cloudbuild.yaml with Kaniko caching enabled.

Additional Information I cannot provide the Dockerfile, but it is based on continuumio/miniconda3 and also installs tensorflow in a conda environment. I think it started failing after tensorflow was added to the list of dependencies.

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 16
Comments: 17

Commits related to this issue

Try kaniko v1.3.0 following https://github.com/GoogleContainerTools/kaniko/issues/1669#issuecomment-1207305515 — committed to GSS-Cogs/dd-cms by ajtucker 2 years ago
Disable cache compression Disable cache compression to allow large images, like images depending on `tensorflow` or `torch`. For more information, see: https://github.com/GoogleContainerTools/kani... — committed to davidcavazos/beam by davidcavazos a year ago

Most upvoted comments

If you add --compressed-caching=false it works for me on 1.9.0

spookyuser on Sep 28, 2022

--compressed-caching=false worked well for most things except for COPY <src> <dst> and it turns out theres also --cache-copy-layers. I was still getting crushed by pytorch installations.

This is the cloudbuild.yaml that works really well now

steps:
- name: 'gcr.io/kaniko-project/executor:latest'
  args:
  - --destination=gcr.io/$PROJECT_ID/<name>
  - --cache=true
  - --cache-ttl=48h
  - --compressed-caching=false
  - --cache-copy-layers=true

jtwigg on Mar 29, 2023

Any news on this? Still happening on v1.9.0

irg1008 on Sep 7, 2022

any update for this issue ?, i am facing same problem when deploy ML image with sentence-transformers and torch>=1.6.0. the image size is more than 3 GB.

wahyueko22 on Mar 8, 2022

Looks like it worked but I tried with cache disabled. On 1.6 even with cache disabled it was stopping. So good sign

Mistic92 on Feb 16, 2022

Any update on this topic? I have this issue on every ML related dockerfile where we need to use pytorch and other libs.

Mistic92 on Feb 16, 2022