kaniko: build image is too slow to unbearable

Actual behavior Hi all: I want to use kaniko to build image in tekton. I find it’s too slow over 40 minutes; but use docker to build image, it just only takes a few minutes. so I try to use kaniko by docker to find what’s going on. It looks seem in “unpack rootfs” phase. sometimes report “stream error: stream ID 17; INTERNAL_ERROR” and failed

Expected behavior I think to build image as fast as docker is just can be used

To Reproduce Steps to reproduce the behavior:

  1. this is my test repo: https://github.com/YuhuWang2002/empty-proj.git
  2. run the build.sh to start kaniko

Additional Information

  • Dockerfile
FROM maven:3.6.3-jdk-8-openj9 as builder
COPY ./settings.xml /usr/share/maven/conf/settings.xml
COPY ./demo /app
WORKDIR /app
RUN mvn package

FROM openjdk:8-jre-alpine
ENV APP_FILE demo-v1.jar
COPY --from=builder /app/target/*.jar /app/
WORKDIR /app
RUN touch xxxx
EXPOSE 8080
ENTRYPOINT ["sh", "-c"]
CMD ["exec java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -jar $APP_FILE"]

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [No]
Please check if the build works in docker but not in kaniko
  • - [Yes]
Please check if this error is seen when you use --cache flag
  • - [No]
Please check if your dockerfile is a multistage dockerfile
  • - [Yes]

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 9
  • Comments: 21 (6 by maintainers)

Most upvoted comments

we are get the same issue for this, this is a very important item for our choice to make a discision for build image tooling , does it have a plan to optimize it ? thank you so much

This issue is not resolved. Closing it doesn’t make sense.

Same behavior. Extraction to filesystem takes too much time.

The idea of kaniko is great, thank you for this project, waiting for news on this issue!

Any movement on this? I just had a build go from 10m to 45m when I split up my Dockerfile steps into various multi-stage parts without introducing any new actual steps.

Just chiming in to say we, too, are having problems with build speed in Kaniko. We’re mainly using Kaniko as part of our Gitlab CI pipelines, because we don’t want to enable privileged containers for our runners. In this regard, it’s great that Kaniko exists at all, because otherwise we would just be out of luck

But speed-wise, the executor build takes forever for any Using caching version of cmd: or Unpacking rootfs step.

Some informations about the machine we’re running this on: It’s a 16 core vm host with 32gb of memory allocated to the vm, and ext4 block storage backed by nfs on a hypervisor level The bottleneck is our uplink, which is about 16Mb symmetric.

I have not checked yet whether the processes that take so long are bottlenecked by uplink speed, though at least the cache retrieval should happen over local ethernet (from our private gitlab registry).

We have turned on registry caching, but have not yet used base image caching since we’d need to set up a warmer for that (it would be nice if kaniko could generate caches on the fly^^'). But I suspect this might not be the main issue, owing to the fact that cache extractions are also very slow.

I hope this information helps a bit. I really like the concept of Kaniko and would like to be able to confidently tell our team that this is what we’ll be using, for the security aspect alone!

If I can provide any more information let me know.

Same here. I have a pipeline step to create a “cache” image for building multiple node apps from the same monorepo (nrwl/nx). It’s a warmer.

“Extracting to fileSystem” takes at least 15 minutes. On docker, it takes a second. This is running on a 8vcore 32GB RAM burstable node on aks. I’ve tried all my tricks to try getting the build as fast as possible. Just unpacking a “cached” step is longer than creating the cache image itself… I’m not sure how useful the cache is at this point

E1126 19:57:29.672455      18 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
INFO[0002] Resolved base name node:lts-slim to builder  
INFO[0002] Using dockerignore file: /work/source/.dockerignore 
INFO[0002] Retrieving image manifest node:lts-slim      
INFO[0002] Retrieving image node:lts-slim from registry index.docker.io 
INFO[0003] Retrieving image manifest node:lts-slim      
INFO[0003] Returning cached image manifest              
INFO[0003] Retrieving image manifest nginx:alpine       
INFO[0003] Retrieving image nginx:alpine from registry index.docker.io 
INFO[0004] Retrieving image manifest nginx:alpine       
INFO[0004] Returning cached image manifest              
INFO[0005] Built cross stage deps: map[0:[/app/dist/apps/da]] 
INFO[0005] Retrieving image manifest node:lts-slim      
INFO[0005] Returning cached image manifest              
INFO[0005] Retrieving image manifest node:lts-slim      
INFO[0005] Returning cached image manifest              
INFO[0005] Executing 0 build triggers                   
INFO[0005] Checking for cached layer myacr/cache:fa2b3731fbf527659bdcf7ddff328489a8b2f62cc29f08c655b6393543c8da0b... 
INFO[0005] Using caching version of cmd: COPY package.json . 
INFO[0005] Checking for cached layer myacr/cache:db7f8f8e7653b50c23cae432ab4fa4f6dccd743a1c4f82f9332ad67a94c22410... 
INFO[0005] Using caching version of cmd: COPY yarn.lock . 
INFO[0005] Checking for cached layer myacr/cache:d8bbd365c352d1aa3ba92fd1b2d48040768c3176d2048f1668edcf6e0f925fbe... 
INFO[0006] Using caching version of cmd: RUN yarn --frozen-lockfile 
INFO[0007] Checking for cached layer myacr/cache:837ea83ff85ab098196dbcbbdcdf940cab6431410ab98a8bb2e96617c023b0e4... 
INFO[0007] Using caching version of cmd: COPY . .       
INFO[0007] Checking for cached layer myacr/cache:529348db565d4c2bac6f6bc8264665506e58bc1312127c08b1f05630952cda76... 
INFO[0007] Using caching version of cmd: RUN chown -R node:node . 
INFO[0007] cmd: USER                                    
INFO[0007] Checking for cached layer myacr/cache:a9106c354b89962584ab4730446e25c3666ed1e54def60d5aec0ae81235af398... 
INFO[0008] Using caching version of cmd: RUN npx nx run-many --target=build --configuration=production --all --parallel=7 && rm -rf node_modules 
INFO[0015] WORKDIR /app                                 
INFO[0015] cmd: workdir                                 
INFO[0015] Changed working directory to /app            
INFO[0015] Creating directory /app                      
INFO[0015] Taking snapshot of files...                  
INFO[0015] COPY package.json .                          
INFO[0015] Found cached layer, extracting to filesystem 
INFO[0015] COPY yarn.lock .                             
INFO[0015] Found cached layer, extracting to filesystem 
INFO[0015] RUN yarn --frozen-lockfile                   
INFO[0015] Found cached layer, extracting to filesystem   <----- 15 minutes at least

Even running a simple chown takes 10 minutes

INFO[0948] RUN chown -R node:node .                     
INFO[0948] Found cached layer, extracting to filesystem <-- 10 minutes

We are having the same issue. For us it is happening with Alpine images only.

@peacememories the issue is that Kaniko, because it is not able to access the Docker machinery, cannot use the overlay filesystem driver, which is what the Docker Daemon uses when building images. This means any “RUN” steps require iterating over all the files in the file system to see what has changed.

If, like me, your RUN commands only create NEW files and never modify existing files, you should be able to change the snapshot mode to “time” (ie. --snapshowMode=time). You could also try out the “redo” mode which is faster than the default “full”.