dagger: Lazy executions are confusing to understand and sometimes don't work as expected

Summary

Developers are often confused by a property of the Dagger engine called “laziness”: pipelines are executed at the latest possible moment, to maximize performance. There are several dimensions to this problem; below is an overview of different dimensions, and status of possible solutions.

Issues Proposals
No withExec
  • #4833
  • Dockerfile build (without exec)
  • #5065
  • Implicit query execution
  • #5065
  • Multiple ways to execute “Pipeline builder” model
    Documentation
  • #3617
  • Issue: no withExec

    We had some users report that part of their pipeline wasn’t being executed, for which they had to add a WithExec(nil) statement for it to work:

    _, err := c.Container().Build(src).ExitCode(ctx)  // doesn't work
    _, err := c.Container().From("alpine").ExitCode(ctx)  // same thing
    

    Explanation

    Users may assume that since they know there’s an Entrypoint/Cmd in the docker image it should work, but it’s just updating the dagger container metadata. There’s nothing to run, it’s equivalent to the following:

    _, err := ctr.
        WithEntrypoint([]string{}).
        WithDefaultArgs(dagger.ContainerWithDefaultArgsOpts{
            Args: []string{"/bin/sh"},
        })
        ExitCode(ctx) // nothing to execute!
    

    ExitCode and Stdout only return something for the last executed command. That means the equivalent of a RUN instruction in a Dockerfile or running a container with docker run.

    Workaround

    Add a WithExec() to tell dagger to execute the container:

    _, err := client.Container().
        Build(src).
    +   WithExec(nil).
        ExitCode(ctx)
    

    The empty (nil) argument to WithExec will execute the entrypoint and default args configured in the dagger container.

    Note If you replace the .ExitCode() with a Publish(), you see that Build() is called and the image is published, because Publish doesn’t depend on execution but Build is still a dependency.

    The same is true for a bound service:

    db := client.Container().From("postgres").
         WithExposedPort(5432).
    +    WithExec(nil)
    ctr := app.WithServiceBinding("db", db)
    

    Here, WithServiceBinding clearly needs to execute/run the postgres container so that app can connect to it, so we need the WithExec here too (with nil for default entrypoint and arguments).

    Proposals

    To avoid astonishment, a fix was added (#4716) to raise an error when fields like .ExitCode or .WithServiceBinding (that depend on WithExec) are used on a container that hasn’t been executed.

    However, perhaps a better solution is to implicitly execute the entrypoint and default arguments because if you’re using a field that depends on an execution, we can assume that you mean to execute the container.

    This is what #4833 proposes, meaning the following would now work as expected by users:

    // ExitCode → needs execution so use default exec
    _, err := c.Container().From("alpine").ExitCode(ctx)
    
    // WithServiceBinding → needs execution so use default exec
    db := client.Container().From("postgres").WithExposedPort(5432)
    ctr := app.WithServiceBinding("db", db)
    
    ### No `withExec`
    - [x] #4716
    - [ ] #4833
    

    Issue: Dockerfile build (without exec)

    Some users just want to test if a Dockerfile build succeeds or not, and don’t want to execute the entrypoint (e.g., long running executable):

    _, err = client.Container().Build(src).ExitCode(ctx)
    

    In this case users are just using ExitCode as a way to trigger the build when they also don’t want to Publish. It’s the same problem as above, but the intent is different.

    Workarounds

    With #4919, you’ll be able to skip the entrypoint:

    _, err = client.Container().
        Build(src).
        WithExec([]string{"/bin/true"}, dagger.ContainerWithExecOpts{
            SkipEntrypoint: true,
        }).
        ExitCode(ctx)
    

    But executing the container isn’t even needed to build, so ExitCode isn’t a good choice here. It’s just simpler to use another field such as:

    - _, err = client.Container().Build(src).ExitCode(ctx)
    + _, err = client.Container().Build(src).Rootfs().Entries(ctx)
    

    However this isn’t intuitive and is clearly a workaround (not meant for this).

    Proposal

    Perhaps the best solution is to use a general synchronization primitive (#5065) that simply forces resolving the laziness in the pipeline, especially since the result is discarded in the above workarounds:

    - _, err = client.Container().Build(src).ExitCode(ctx)
    + _, err = client.Container().Build(src).Sync(ctx)
    
    ### `Dockerfile` build (without exec)
    - [x] #4919
    - [ ] #5065
    

    Issue: Implicit query execution

    Some functions are “lazy” and don’t result in a query execution (e.g., From, Build, WithXXX), while others execute (e.g., ExitCode, Stdout, Publish).

    It’s not clear to some users which is which.

    Explanation

    The model is implicit, with a “rule of thumb” in each language to hint which ones execute:

    • Go: functions taking a context and returning an error
    • Python and Node.js: async functions that need an await

    Essentially, each SDK’s codegen (the feature that introspects the API and builds a dagger client that is idiomatic in each language) transforms leaf fields into an implicit API request when called, and return the value from the response.

    So the “rule of thumb” is based on the need to make a request to the GraphQL server, the problem is that it may not be immediately clear and the syntax can vary depending on the language so there’s different “rules” to understand.

    This was discussed in:

    Proposal

    The same Pipeline Synchronization proposal from the previous issue helps make this a bit more explicit:

    _, err := ctr.Sync(ctx)
    
    ### Implicit query execution
    - [x] #3555
    - [x] #3558
    - [ ] #5065
    

    Issue: Multiple ways to execute

    “Execution” sometimes mean different things:

    • Container execution (i.e., Container.withExec)
    • Query execution (i.e., making a request to the GraphQL API)
    • ”Engine” execution (i.e., doing actual work in BuildKit)

    The ID fields like Container.ID for example, make a request to the API, but don’t do any actual work building the container. We reduced the scope of the issue in the SDKs by avoiding passing IDs around (#3558), and keeping the pipeline as lazy as possible until an output is needed (see Implicit query execution above).

    More importantly, users have been using .ExitCode(ctx) as the goto solution to “synchronize” the laziness, but as we’ve seen in the above issues, it triggers the container to execute and there’s cases where you don’t want to do that.

    However, adding the general .Sync() (#4205) to fix that may make people shift to using it as the goto solution to “resolve” the laziness instead (“synchronize”), which actually makes sense. The problem is that we now go back to needing WithExec(nil) because .Sync() can’t assume you want to execute the container.

    That’s a catch 22 situation! There’s no single execute function to “rule them all”.

    It requires the user to have a good enough grasp on these concepts and the Dagger model to chose the right function for each purpose:

    // exec the container (build since it's a dependency)
    c.Container().Build(src).ExitCode(ctx)
    
    // just build (don't exec)
    c.Container().Build(src).Sync(ctx)
    

    Proposal

    During the “implicit vs explicit” discussions, the proposal for the most explicit solution was for a “pipeline builder” model (https://github.com/dagger/dagger/issues/3555#issuecomment-1301327344).

    The idea was to make a clear separation between building the lazy pipeline and executing the query:

    // ExitCode doesn't implicitly execute query here! Still lazy.
    // Just setting expected output, and adding exec as a dependency.
    // Build is a dependency for exec so it also runs.
    q := c.Container().Build(src).ExitCode()
    
    // Same as above but don't care about output, just exec.
    q := c.Container().Build(src).WithExec(nil)
    
    // Same as above but don't want to exec, just build.
    q := c.Container().Build(src)
    
    // Only one way to execute query!
    client.Query(q)
    

    Downsides

    • It’s a big breaking change so it’s not seen as a viable solution now
    • No great solution to grab output values
    • More boilerplate for simple things

    Solution

    Embrace the laziness!

    Issue: Documentation

    We have a guide on Lazy Evaluation but it’s focused on the GraphQL API and isn’t enough to explain the above issues.

    We need better documentation to help users understand the “lazy DAG” model (https://github.com/dagger/dagger/issues/3617). It’s even more important if the “pipeline builder” model above isn’t viable.

    ### Documentation
    - [x] #3622
    - [ ] #3617
    

    Affected users

    These are only some examples of users that were affected by this:

    About this issue

    • Original URL
    • State: closed
    • Created a year ago
    • Comments: 27 (24 by maintainers)

    Most upvoted comments

    As an end user I can say that I’ve switched to using sync() in my pipelines and it makes a lot more sense now than it did before. I feel pretty confident that the UX improvement has resolved the issue, at least to me, though the docs should be improved to explain what’s going on, as Helder mentioned. So I will second closing this issue and directing further action to documentation improvement

    Relaying a report from @morlay in Discord who was affected by the change to raise an error.

    They were using Build().ExitCode(ctx) to force the Build() to happen. Their intent is actually to ignore its ENTRYPOINT or CMD and just see if the image is able to build. I suggested adding WithExec(nil) but they said that causes it to hang, because it runs the entrypoint.

    It seems like we need an intuitive way of un-lazying Build(). I’m guessing this will come up for anyone that was using Dockerfile as their test runner before and needs to wrap it so they can gradually adopt Dagger. I suggested doing something silly like Build().Rootfs().Entries(ctx) for now.

    @samalba It’s a small enough change so I’ll just submit a PR for my proposal at the end of https://github.com/dagger/dagger/issues/4668#issuecomment-1453732766 and we can decide its fate in the PR

    “Container is an Image (and inherits all properties), but Image is not a Container”.

    Yeah, that makes sense.

    I just updated the description with a compilation of all the issues that I see related to this, with their explanations, workarounds and proposed solutions.

    Notes

    • The summary acts a a table of contents so you can click the issue name to jump to it’s section.
    • There’s a narrative if you read in order, where one issue builds upon the previous one.
    • Action steps is solving the proposed solutions in the summary.
    • Let me know if anyone has anything to add or change.
    • Whenever possible, use the proposals to further discussion on those solutions.

    I meant it’s not possible because methods that send a request to the API have different signatures (context, return/raise error, coroutine/promise).

    We have similar rules, that were discussed in:

    Oh boy. This one brings back memories…