dagger: Lazy executions are confusing to understand and sometimes don't work as expected
Summary
Developers are often confused by a property of the Dagger engine called “laziness”: pipelines are executed at the latest possible moment, to maximize performance. There are several dimensions to this problem; below is an overview of different dimensions, and status of possible solutions.
| Issues | Proposals |
|---|---|
No withExec |
|
Dockerfile build (without exec) |
|
| Implicit query execution | |
| Multiple ways to execute | “Pipeline builder” model |
| Documentation |
Issue: no withExec
We had some users report that part of their pipeline wasn’t being executed, for which they had to add a WithExec(nil) statement for it to work:
_, err := c.Container().Build(src).ExitCode(ctx) // doesn't work
_, err := c.Container().From("alpine").ExitCode(ctx) // same thing
Explanation
Users may assume that since they know there’s an Entrypoint/Cmd in the docker image it should work, but it’s just updating the dagger container metadata. There’s nothing to run, it’s equivalent to the following:
_, err := ctr.
WithEntrypoint([]string{}).
WithDefaultArgs(dagger.ContainerWithDefaultArgsOpts{
Args: []string{"/bin/sh"},
})
ExitCode(ctx) // nothing to execute!
ExitCode and Stdout only return something for the last executed command. That means the equivalent of a RUN instruction in a Dockerfile or running a container with docker run.
Workaround
Add a WithExec() to tell dagger to execute the container:
_, err := client.Container().
Build(src).
+ WithExec(nil).
ExitCode(ctx)
The empty (nil) argument to WithExec will execute the entrypoint and default args configured in the dagger container.
Note If you replace the
.ExitCode()with aPublish(), you see thatBuild()is called and the image is published, becausePublishdoesn’t depend on execution butBuildis still a dependency.
The same is true for a bound service:
db := client.Container().From("postgres").
WithExposedPort(5432).
+ WithExec(nil)
ctr := app.WithServiceBinding("db", db)
Here, WithServiceBinding clearly needs to execute/run the postgres container so that app can connect to it, so we need the WithExec here too (with nil for default entrypoint and arguments).
Proposals
To avoid astonishment, a fix was added (#4716) to raise an error when fields like .ExitCode or .WithServiceBinding (that depend on WithExec) are used on a container that hasn’t been executed.
However, perhaps a better solution is to implicitly execute the entrypoint and default arguments because if you’re using a field that depends on an execution, we can assume that you mean to execute the container.
This is what #4833 proposes, meaning the following would now work as expected by users:
// ExitCode → needs execution so use default exec
_, err := c.Container().From("alpine").ExitCode(ctx)
// WithServiceBinding → needs execution so use default exec
db := client.Container().From("postgres").WithExposedPort(5432)
ctr := app.WithServiceBinding("db", db)
### No `withExec`
- [x] #4716
- [ ] #4833
Issue: Dockerfile build (without exec)
Some users just want to test if a Dockerfile build succeeds or not, and don’t want to execute the entrypoint (e.g., long running executable):
_, err = client.Container().Build(src).ExitCode(ctx)
In this case users are just using ExitCode as a way to trigger the build when they also don’t want to Publish. It’s the same problem as above, but the intent is different.
Workarounds
With #4919, you’ll be able to skip the entrypoint:
_, err = client.Container().
Build(src).
WithExec([]string{"/bin/true"}, dagger.ContainerWithExecOpts{
SkipEntrypoint: true,
}).
ExitCode(ctx)
But executing the container isn’t even needed to build, so ExitCode isn’t a good choice here. It’s just simpler to use another field such as:
- _, err = client.Container().Build(src).ExitCode(ctx)
+ _, err = client.Container().Build(src).Rootfs().Entries(ctx)
However this isn’t intuitive and is clearly a workaround (not meant for this).
Proposal
Perhaps the best solution is to use a general synchronization primitive (#5065) that simply forces resolving the laziness in the pipeline, especially since the result is discarded in the above workarounds:
- _, err = client.Container().Build(src).ExitCode(ctx)
+ _, err = client.Container().Build(src).Sync(ctx)
### `Dockerfile` build (without exec)
- [x] #4919
- [ ] #5065
Issue: Implicit query execution
Some functions are “lazy” and don’t result in a query execution (e.g., From, Build, WithXXX), while others execute (e.g., ExitCode, Stdout, Publish).
It’s not clear to some users which is which.
Explanation
The model is implicit, with a “rule of thumb” in each language to hint which ones execute:
- Go: functions taking a context and returning an error
- Python and Node.js:
asyncfunctions that need anawait
Essentially, each SDK’s codegen (the feature that introspects the API and builds a dagger client that is idiomatic in each language) transforms leaf fields into an implicit API request when called, and return the value from the response.
So the “rule of thumb” is based on the need to make a request to the GraphQL server, the problem is that it may not be immediately clear and the syntax can vary depending on the language so there’s different “rules” to understand.
This was discussed in:
Proposal
The same Pipeline Synchronization proposal from the previous issue helps make this a bit more explicit:
_, err := ctr.Sync(ctx)
### Implicit query execution
- [x] #3555
- [x] #3558
- [ ] #5065
Issue: Multiple ways to execute
“Execution” sometimes mean different things:
- Container execution (i.e.,
Container.withExec) - Query execution (i.e., making a request to the GraphQL API)
- ”Engine” execution (i.e., doing actual work in BuildKit)
The ID fields like Container.ID for example, make a request to the API, but don’t do any actual work building the container. We reduced the scope of the issue in the SDKs by avoiding passing IDs around (#3558), and keeping the pipeline as lazy as possible until an output is needed (see Implicit query execution above).
More importantly, users have been using .ExitCode(ctx) as the goto solution to “synchronize” the laziness, but as we’ve seen in the above issues, it triggers the container to execute and there’s cases where you don’t want to do that.
However, adding the general .Sync() (#4205) to fix that may make people shift to using it as the goto solution to “resolve” the laziness instead (“synchronize”), which actually makes sense. The problem is that we now go back to needing WithExec(nil) because .Sync() can’t assume you want to execute the container.
That’s a catch 22 situation! There’s no single execute function to “rule them all”.
It requires the user to have a good enough grasp on these concepts and the Dagger model to chose the right function for each purpose:
// exec the container (build since it's a dependency)
c.Container().Build(src).ExitCode(ctx)
// just build (don't exec)
c.Container().Build(src).Sync(ctx)
Proposal
During the “implicit vs explicit” discussions, the proposal for the most explicit solution was for a “pipeline builder” model (https://github.com/dagger/dagger/issues/3555#issuecomment-1301327344).
The idea was to make a clear separation between building the lazy pipeline and executing the query:
// ExitCode doesn't implicitly execute query here! Still lazy.
// Just setting expected output, and adding exec as a dependency.
// Build is a dependency for exec so it also runs.
q := c.Container().Build(src).ExitCode()
// Same as above but don't care about output, just exec.
q := c.Container().Build(src).WithExec(nil)
// Same as above but don't want to exec, just build.
q := c.Container().Build(src)
// Only one way to execute query!
client.Query(q)
Downsides
- It’s a big breaking change so it’s not seen as a viable solution now
- No great solution to grab output values
- More boilerplate for simple things
Solution
Embrace the laziness!
Issue: Documentation
We have a guide on Lazy Evaluation but it’s focused on the GraphQL API and isn’t enough to explain the above issues.
We need better documentation to help users understand the “lazy DAG” model (https://github.com/dagger/dagger/issues/3617). It’s even more important if the “pipeline builder” model above isn’t viable.
### Documentation
- [x] #3622
- [ ] #3617
Affected users
These are only some examples of users that were affected by this:
- from @RonanQuigley
DX or a Bug: In order to have a dockerfile’s entrypoint executed, why did we need to use a dummy withExec? There was a unamious 😩 in our team call after we figured this out.
- https://discord.com/channels/707636530424053791/708371226174685314/1079926439064911972
- https://discord.com/channels/707636530424053791/1080160708123185264/1080174051965812766
- https://discord.com/channels/707636530424053791/1080160708123185264/1080174051965812766
- #5010
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 27 (24 by maintainers)
As an end user I can say that I’ve switched to using sync() in my pipelines and it makes a lot more sense now than it did before. I feel pretty confident that the UX improvement has resolved the issue, at least to me, though the docs should be improved to explain what’s going on, as Helder mentioned. So I will second closing this issue and directing further action to documentation improvement
Relaying a report from @morlay in Discord who was affected by the change to raise an error.
They were using
Build().ExitCode(ctx)to force theBuild()to happen. Their intent is actually to ignore itsENTRYPOINTorCMDand just see if the image is able to build. I suggested addingWithExec(nil)but they said that causes it to hang, because it runs the entrypoint.It seems like we need an intuitive way of un-lazying
Build(). I’m guessing this will come up for anyone that was usingDockerfileas their test runner before and needs to wrap it so they can gradually adopt Dagger. I suggested doing something silly likeBuild().Rootfs().Entries(ctx)for now.@samalba It’s a small enough change so I’ll just submit a PR for my proposal at the end of https://github.com/dagger/dagger/issues/4668#issuecomment-1453732766 and we can decide its fate in the PR
Yeah, that makes sense.
I just updated the description with a compilation of all the issues that I see related to this, with their explanations, workarounds and proposed solutions.
Notes
I meant it’s not possible because methods that send a request to the API have different signatures (context, return/raise error, coroutine/promise).
Oh boy. This one brings back memories…