moby: --cache-from may corrupt config.Cmd

Output of docker version:

$ docker version -f '{{ .Server.Version }}'
18.05.0-ce

Also tested with 18.03.1-ce, same issue.

Consider the following script: https://gist.github.com/flx42/97b9f20e2ea04013843955f86e780254 These are the last two lines of output:

+ docker inspect test -f '{{ .Config.Cmd }}'
[/bin/sh -c #(nop)  ENV VAR=value]

This is not a valid CMD, and this doesn’t match docker history:

$ docker history test
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
1e24e7c47bd6        2 minutes ago       /bin/sh -c #(nop)  ENV VAR=value                0B                  
3fd9065eaf02        5 months ago        /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B                  
<missing>           5 months ago        /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb…   4.15MB              

It seems --cache-from, --label and serialization (push/pull or save/load) do not play well together. If you remove the --label from the first build, it works. If you remove the save/load (and keep the image locally), it also works.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 4
  • Comments: 18 (15 by maintainers)

Most upvoted comments

Today I ran into this very issue. I had to override the command in docker-compose to get the service running.

I really, really, REALLY hope this would get fixed. While most people run CI systems and have replicas & blue/green deployments etc, this could still potentially bring someone down. Or like us, it did bring down the CI/staging system down which itself was a big bummer.

And if you barely know how to use docker, you might never understand why suddenly your container isn’t running as it is supposed to.

So a very big +1 for a fix.

@mareksuscak 19.03(or buildx) supports “inline” cache mode (in addition to cache manifests) if you want to use --cache-from with the production images. https://github.com/docker/cli/blob/8b7c86d5b082e72cfdf5b0d0909311eacb2ff88a/docs/reference/commandline/build.md#specifying-external-cache-sources

So I would like to know if the BuildKit backend will be affected too.

BuildKit is not affected by this as it doesn’t do any config pattern matching at all. In BuildKit you would point --cache-from to a special cache manifest that only has the cache checksum graph.

Now, regarding the code path. The function restoreCachedImage seems to be called only when using --cache-from with an imported cache. This is the config of the “restored” layer: https://github.com/moby/moby/blob/4d6219264655bbb4510642e40d7914fd56962596/image/cache/cache.go#L117-L130

However, cfg is the container config. Therefore it includes the (nop) prefix, I guess for cache comparison reasons, it’s added here: https://github.com/moby/moby/blob/4d6219264655bbb4510642e40d7914fd56962596/builder/dockerfile/internals.go#L307-L313

I think we are definitely losing the reference to the right Cmd here: https://github.com/moby/moby/blob/4d6219264655bbb4510642e40d7914fd56962596/builder/dockerfile/internals.go#L410 Because it’s passing runConfig instead of dispatchState.runConfig (rightfully, for cache comparison).

Since the bug appears to be involving a combination of two different packages for one specific case (“restored” cache). I didn’t see an obvious fix.

Therefore, I think the options are:

  1. Do nothing and wait for buildkit to solve this problem.
  2. Modify the Config API type to add the saved Cmd only for this particular case. Feels extremely kludgy.
  3. Try to workaround in a less intrusive way, for instance by saving the Cmd to a label before calling the cache. Still feels like a hack.

Thoughts @thaJeztah @tonistiigi?

I had a little bit of time to look into the root cause in the code. First of all I’ve come up with a case that’s a more serious bug:

#!/bin/bash
set -eux

docker rmi cache test || true

docker build -t cache --label "ref=12345" - <<EOF
FROM alpine
RUN rm -f /tmp/*
EOF

docker inspect cache -f '{{ .Config.Cmd }}'

docker save cache -o cache.tar
docker rmi cache
docker load -i cache.tar

# Should still be /bin/sh
docker inspect cache -f '{{ .Config.Cmd }}'

# Same than above, but without the --label and with --cache-from
docker build --cache-from=cache -t test - <<EOF
FROM alpine
RUN rm -f /tmp/*
EOF

# *NOT* /bin/sh
docker inspect test -f '{{ .Config.Cmd }}'

The output:

[...]
+ docker inspect test -f '{{ .Config.Cmd }}'
[/bin/sh -c rm -f /tmp/*]

So, if you do docker run --rm -ti -v /tmp:/tmp test, it will delete your files in the host /tmp, not what you expected. Granted, you would have to be unlucky to end up in this situation, but that would be very puzzling if that happens.