kaniko: Image build process Freezes on `Taking snapshot of full filesystem...`
Actual behavior While building image using gcr.io/kaniko-project/executor:debug in gitlab CI runner hosted on kubernetes using helm chart the image build process freezes on Taking snapshot of full filesystem… for the time till the runner timeouts(1 hr) This behaviour is intermittent as for the same project image build stage works sometimes
Issue arises in multistage as well as single stage Dockerfile.
Expected behavior
Image build should not freeze at Taking snapshot of full filesystem... and should be successful everytime.
To Reproduce As the behaviour is intermittent not sure how it can be reproduced
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 19
- Comments: 47 (4 by maintainers)
The issue is still actual for me too. Any updates?
I am experience this problem while building an image with less than a gb. Interesting that it fails silently. GitLab CI job will be marked as successfull but no image is actually pushed.
We are using kaniko for several other projects but this error only happens on two projects. Both are monorepos and use lerna for extending yarn commands to sub packages.
I must say it was working at some point and it does work normally when using docker to build the image
Here is a snippet of the build logs:
Interesting to note that
RUN yarn install --network-timeout 100000is not the last step in the dockerfile.neither
--snapshotMode=redonor--use-new-runsolved the problemwe could fix the gitlab cicd pipeline error
with
--compressed-caching=falseandv1.8.0-debug. The image is around 2 GB. Alpine reported around 4 GB in around 100 packages.Adding a data point, I was initially observing the build process freezing problem, when I do not add any memory/cpu request/limits. Then I added memory/cpu request & limits, the process starts to OOM. I increased memory limit to 6GB, but it still reaches OOM killed. When looking at the memory usage, it skyrockets at the end – when log reaches taking snapshot of file system. EDIT: I tried building the same image in local docker, and maximum memory usage is less than 1GB.
logs
version: gcr.io/kaniko-project/executor:v1.6.0-debug args: I added snapshotMode redo, cache=true env: GKE 1.19, use kubeflow pipelies to run kaniko containers
same issue , nothing changed only version of kaniko
Hold on a second, maybe I spoke early!
My pipeline currently builds multiple images in parallel. I didn’t realize before that one of them that before was stuck in taking snapshot now goes on smoothly with
--snapshotMode=redo --use-new-runandgcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631-debug.The images actually stuck are basically the same Postgres image built with different
build-argvalues, so this ends up by running in parallel (and caching in parallel) the same layers.I consequently tried to remove this parallelism and tried to build these Postgres images in sequence. I ended up with Postgres images stuck in taking snapshot in parallel with a totally different NodeJs image, also stuck in taking snapshots.
So from my tests it looks like when building images happens in parallel against the same registry mirror used as cache, if one image is taking snapshots in parallel with another it gets stuck.
It may be a coincidence, maybe not. I repeat: this is from my tests, it could be totally unrelated to the problem
Edit: my guess is wrong, I reverted to kaniko:1.3.0-debug and added enough memory requests & limit, but I’m still observing the image build freezing problem from time to time.
Hello everyone! I found solution here https://stackoverflow.com/questions/67748472/can-kaniko-take-snapshots-by-each-stage-not-each-run-or-copy-operation adding option to kaniko --single-snapshot
/kaniko/executor –context “${CI_PROJECT_DIR}” –dockerfile “${CI_PROJECT_DIR}/Dockerfile” –destination “${YC_CI_REGISTRY}/${YC_CI_REGISTRY_ID}/${CI_PROJECT_PATH}😒{CI_COMMIT_SHA}” –single-snapshot
I have the same issue on Gitlab CI/CD but only when cache is set to true