moby: ARG before FROM in Dockerfile doesn't behave as expected

Description

It’s documented that ARG can appear before FROM, so that arguments may be substituted into image names etc.

Rather than having some ARG before and some ARG after FROM, for consistency I attempted to place all my ARG before FROM. However, to my surprise (after a lot of debugging) I determined that my arguments are always blank after FROM.

I believe the meta-arg functionality/refactoring may somehow be responsible:

https://github.com/moby/moby/commit/239c53bf836174108dbae445a394a290f5fe2898

Steps to reproduce the issue:

  1. Produce a Dockerfile such as:
ARG environment
FROM alpine:3.5
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment
  1. Build the image and run the image, printing the value of environment ARG (stored in /value_of_environment):
docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

Describe the results you received:

development

Describe the results you expected:

production

Additional information you deem important (e.g. issue happens only occasionally):

Altering the Dockerfile such that ARG comes after FROM i.e.

FROM alpine:3.5
ARG environment
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment

then running again:

docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

gives the expected output of production.

Output of docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:51:55 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 59
 Running: 0
 Paused: 0
 Stopped: 59
Images: 370
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 457
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.31-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.818GiB
Name: moby
ID: BCV5:MEMK:BYKI:I2IU:QY2V:5DRM:F2FP:JFAG:SM46:M2WJ:73YV:3KLP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 20
 Goroutines: 40
 System Time: 2017-07-16T19:58:09.054157098Z
 EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 40
  • Comments: 31 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Irrespective of whether this was implemented this way intentionally or it’s a bug; I think it’s a bit of a usability nightmare.

It’s not clearly documented that this is the expected behaviour, and it makes for messy Dockerfile. But more importantly, it opens a pandora’s box of confusing edge-cases.

What if I intend to use an ARG in both my FROM statement and after it? Am I expected to have multiple ARG statements referring to the same build-arg?

What happens if I use default value syntax ARG argument=some_value before FROM and just ARG argument after FROM? What is the expected value of argument after FROM if no argument build-arg was passed?

Improved documentation is always appreciated, and would have saved me some time. However, just because behaviour is documented doesn’t preclude the behaviour itself from scrutiny.

ARG has too much complexity to it. I’d argue this functionality shouldn’t have been added to the ARG keyword in the first place, it’s effectively been repurposed and its behaviour is now far to nuanced. A new keyword FROMARG from the on-set would have made a lot more sense.

If you want to use the same ARG before and after FROM, simply re-declare it after, e.g.:

ARG my_arg
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"

As a new user of ARG it was very unintuitive why my ARG was empty. I saw someone use an example of ARG in a Dockerfile, but they were using it in the FROM line. For me it makes sense to define any parameterisation of a Dockerfile at the very top, so I didn’t question it. Only upon rereading the docs after reading this issue do I understand why.

I would suggest a warning that ARG gets reset after FROM in the documentation, as not everyone is up to speed on multistage builds.

I lost couple of hours to this. Intuitively I was expecting that ARG before FROM in multistage build will be a global ARG (for all stages). In simply gets cleared instead.

To be clear, I’m not saying I don’t understand how the current implementation works, what has been written in this issue explains it clearly enough. I’m suggesting the implementation itself is non-ideal and confusing; after all, I read the existing docs and literally cloned Docker compose, Docker client and finally Docker before working out what was going on - at which point I opened this issue.

It’s just too complicated. Adding so much complexity to the Dockerfile syntax and the corresponding documentation is simply not sustainable.

The semantics changed with multi-stage builds. The change doesn’t really have anything to do with ARG in FROM. It just happens they came out in the same release.

I don’t think this is necessarily 100% accurate that multi-stage and ARG in FROM are independent, they should have been independent, but I think the existence of multi-stage impacted the implementation of ARG in FROM.

The properties of ARG were:

  1. It may appear after FROM.

  2. The argument defined by ARG may be used on any line following the definition.

(2. is the way Dockerfiles always worked, sequential, state is additive, never subtractive).

A feature request comes along:

I’d like to use arguments in FROM.

Reasonable enough, the two previously defined properties still hold if implemented. We now have a third property:

  1. ARG may appear before FROM.

This can cleanly be implemented, without any backwards compatibility issues. Except, it wasn’t; it could have been, but it wasn’t.

Instead, property 2. was violated, suddenly ARG can’t always be used after its defined. If it appears before FROM, then it can only be used in FROM, not on all subsequent lines.

That’s changing the semantics of ARG, hence why I’m suggesting it should have been FROMARG, a keyword that can only appear in the “meta section” prior to FROM.

Mind you, this constraint is artificial in nature, there’s zero reason 3. shouldn’t have been implemented cleanly. The only reason the current implementation was deemed acceptable is because multi-stage builds were also coming, and it was also violating 2., albeit in a (roughly) well-defined fashion.

Anyway, my issue is complexity; that’s subjective and given I’m not a maintainer, not for me to decide. Documentation is certainly better than nothing, so this issue may be closed if you see fit.

simply re-declare it

This is an over simplification. You are not considering default values and the programming rule of one single source of truth.

ARG my_arg="default"
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg="default"
# This should not be empty
RUN echo "my_arg is $my_arg"

We now have the arg’s default value defined twice in one file - we have lost the single source of truth.

I ran into the same issue and in order to underline the impact of that behaviour, I want so share my example here, whos cause took a significant amount of time to figure out. Still it’s totally unexpected and I wont exactlly call that user experience. Please, if you don’t see the necessity to change that bahaviour, then at least document it as the creator of this issue suggested, so that people can stumble upon this.

docker image build \
        --build-arg NODE_VERSION="4.8.3" \
        --build-arg NPM_VERSION="4.5.0"

Works not as expected. NPM_VERSION holds "latest".

ARG NODE_VERSION="latest"
ARG NPM_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

RUN npm install -g npm@${NPM_VERSION}
...

Works as intended. NPM_VERSION holds "4.5.0".

ARG NODE_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

ARG NPM_VERSION="latest"
RUN npm install -g npm@${NPM_VERSION}
...

@thaJeztah I know that’s true now, I’ve experimented with it. The issue is that it’s hugely non-obvious.

If this is expected behaviour and no-one is willing to change it. Then at the very least ARG ought to be deprecated (before FROM) and instead when used prior to FROM the syntax should be FROMARG (which must come before FROM).

This is an over simplification. You are not considering default values

The example given actually takes care of default values;

docker build --no-cache -<<'EOF'
ARG my_arg=latest
FROM busybox:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
EOF

Sending build context to Docker daemon  2.048kB
Step 1/5 : ARG my_arg=latest
Step 2/5 : FROM busybox:$my_arg
 ---> 59788edf1f3e
Step 3/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 029ff9c3cdc8
my_arg is 
Removing intermediate container 029ff9c3cdc8
 ---> f9135f511c84
Step 4/5 : ARG my_arg
 ---> Running in 7c9616537324
Removing intermediate container 7c9616537324
 ---> 35ccdf7ea0a9
Step 5/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 1e712eef0399
my_arg is latest
Removing intermediate container 1e712eef0399
 ---> 56c25e303cb9
Successfully built 56c25e303cb9

I also posted some examples in https://github.com/moby/moby/issues/37622#issuecomment-412101935, https://github.com/moby/moby/issues/37345#issuecomment-400245466

@thaJeztah correct me if I’m wrong.

@Benjamin-Dobell after investigating this, https://github.com/moby/moby/commit/239c53bf836174108dbae445a394a290f5fe2898 is not the origin of this behavior.

Basically, after the FROM instruction all the build arguments are reset and thus aren’t available in the Dockerfile.

From what I found the purpose of ARG before FROM is to use it inside the FROM instruction https://github.com/moby/moby/pull/31352

I should note, that I’m not actually an advocate of expanding the grammar when the usage of the existing grammar can be expanded.

However, in this particular instance ARG has had its existing semantics altered; the behaviour is not additive. Previously whenever you referenced an ARG defined argument you’d have access to the value as expected. Now argument interpolation is much more context aware.

It’s extremely confusing in single stage builds, and perhaps more-so in multi-stage ones. If arguments really are tied to build stages (although I must confess I’m not sure why this is desirable), then you’ve suddenly a need to look at the previous “stage”, beyond the FROM verb.

Realistically, you can’t pass different arguments to different build stages (they’re typically provided as CLI arguments). So there’s no legitimate reason to scope arguments to build stages. Additionally:

a “cache miss” occurs upon its first usage, not its definition

So there is zero incentive to intersperse ARG definitions through-out a file. Therefore, the most logical behaviour would be to encourage all ARG definitions to be placed at the top of a file (where they can clearly be seen) and then update the behaviour to ensure there’s no funny business with build stages.

This is horrible to way with something that seems to be a global value. I have a dockerfile with multiple FROM statements and things are breaking because I can’t pass the arg values as I originally thought. Sure, maybe I should read the documentation a bit more but it seems I am not alone in expecting this behaviour (ARG being global) so maybe things should work as the MAJORITY think it should?

I have a reverse twist on this. I remembered from the docs that ARG had to appear before FROM in order to be used in FROM, so I put an ARG before the FROM of my second builder declaration. And got an invalid-format error on the FROM line, because that ARG appeared after the first FROM in the file, and so was ignored when processing the second FROM line. So ARG-before-the-first-FROM is global for all FROM lines and not used in any other lines, while ARG-after-FROM is used only between that FROM and the next FROM. It is consistent in a way, but completely non-intuitive, so really the ARG-before-FROM ought to be named FROMARG as suggested earlier in this thread, because otherwise it just breaks expectations left and right.

As far as this keyword behaves with multiple FROM statements, in “multi-stage” builds, ARG lets you specify different defaults for different stages, but there is no way (nor should there be) to pass different values explicitly to different stages. That’s far more convoluted than having ARGs go into effect from the keyword down, across any number of stages/FROMs.

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.

Yikes! That also needs documenting… and changing.

It’s the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

When looking at a Dockerfile, what syntax marks the beginning of a new build stage?

FROM does, and yet, somehow it accesses ARG defined prior to this line.

Please, if you don’t see the necessity to change that bahaviour, then at least document it so that people can stumble upon this.

https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact https://docs.docker.com/engine/reference/builder/#scope

If this is a common pattern a PR would probably be accepted that detects this case (at least for variable substitution) and shows a warning about possible misuse.

@Benjamin-Dobell I wanted to use build-args in multistage builds to pass secure keys to intermediate build stages which would then disappear. I haven’t completely got confirmation that this is secure, but I was actually happy to see your issue.

For the record, aside from implementation details which respondents seem to be burdening you with, clearing build args – at least so they can’t be read from the build history – seems IMO to be a very important feature… well worth the complexity.

UPDATE – sigh … I guess I spoke prematurely. Multistage builds don’t help with the fact that args are written to build history.

However, in this particular instance ARG has had it’s semantics altered. Previously whenever you referenced an ARG defined argument you’d have access to the value as expected. Now argument interpolation is much more context aware.

The new ARG features are 100% backward compatible. No previous Dockerfile needs any changes.

then you’ve suddenly a need to look at the previous “stage”, beyond the FROM verb.

It’s the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

a “cache miss” occurs upon its first usage, not its definition

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.