kaniko: Repeated builds using cache produce broken images
Actual behavior Hey we’re having some problems with kaniko 0.19 and python. We have a dockerfile that looks vaguely like this:
FROM python:3.7.4-slim AS service-base
RUN apt-get update \
&& apt-get --assume-yes install curl gpg libcurl4-openssl-dev libssl-dev gcc git build-essential netcat libsnappy-dev
RUN pip install poetry=="1.0.2"
COPY ./_packages /api/_packages
COPY ./config /config
WORKDIR /service/cat
COPY . /service/cat
RUN rm -rf _packages
RUN rm -rf config
RUN \
poetry config virtualenvs.create false && \
poetry install
For reasons independent of us, we can’t actually re-organize the sourcecode to remove the RUN rm lines.
What we’re seeing is that if our cache ECR repo is empty, then the image is fine.
We then build an image using the cache, but change one of later layers (for example adding a file to /service/cat)
The build then completes, but upon pulling the image from ECR to a kubelet, we see:
Failed to pull image
"{repo_url}": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): file exists
Expected behavior Subsequent builds using cache should produce an image that can be ran on kubernetes
To Reproduce See above
Additional Information
- Dockerfile See above
- Build Context Set up a poetry project analogous to the issue here https://github.com/python-poetry/poetry/issues/1757
- gcr.io/kaniko-project/executor@sha256:0d0e34396f47ec6d5fd75aebb9772147a78d96ed2bbb16ec892bd178efdc8307
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 17
- Comments: 38 (19 by maintainers)
Commits related to this issue
- Downgrade to kaniko 0.16 because of regression https://github.com/GoogleContainerTools/kaniko/issues/1162 — committed to datawire/aes-project-builder by LukeShu 4 years ago
- Use kaniko to build qa image Use kaniko v0.16.0 https://github.com/GoogleContainerTools/kaniko/issues/1162 — committed to terrchen/gitlab by deleted user 4 years ago
- fix cache bug bump executor version to fix this bug: https://github.com/GoogleContainerTools/kaniko/issues/1162 — committed to hatsuyuki15/drone-kaniko by hatsuyuki15 3 years ago
We’re continuously bumping into this as well and it makes the cache completely unusable. Is there anything we can do to help prioritise this?
Thanks for all your effort.
Finally managed to reproduce it consistently and figured out that it has to do with incorrect whiteout of certain files. I am still not very familiar with the logic in that area, but seeing that it was recently refactored in #1069, perhaps @tejal29 or @cvgw have a better intuition on what may have gone wrong?
To Reproduce:
If you run on certain Linux kernels, you may fail to build it with the latest kaniko image (see issue 1202). Run
make imagesand build a local image from the source code. That will fix it.Dockerfile
Context:
pyproject.toml is the only file whose contents are meaningful. The text files are just “hello world”.
Build command to GCR:
Build it once, and you’ll see an image on GCR with size of ~190MB that can be pulled. edit pyproject.toml and add a new dependency f.e
pytz = "2020.1". Build it again, and you’ll see a new image, this time with size of ~320MB, that fails to be pulled:Some observations:
resolve.go), you can see that at some stage,pager.1.gzis whited-out and appears as.wh.pager.1.gzwhen files are resolved. IIUC this is what the eventual error says - it tries to create a file that already exists on the lower layers of the overlay FS.snapshot.go, making it skip pager.1.gz:The produced image is still with the wrong size (320MB, instead of 190MB), but it actually can be pulled normally. Which indicates that the bug that doubled layer’s size is still not resolved.
It is NOT fixed in debug-v0.20.0 so you’ll need a
before your echo. And gitlab needs to change its documentation.
We are also hitting this problem
image:
gcr.io/kaniko-project/executor:debugcommand:
/kaniko/executor --context=${context} --dockerfile=${Dockerfile} --destination=${image} --cache=true --cache-repo=${IMAGE_CACHE}More detail in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/28988