spec: Idea for major (breaking) change: merge cache and launch directories
Note: This proposal has been edited. Comments below it may be outdated. See the edit history for previous versions.
After reviewing the various initial buildpack implementations, I’ve observed these patterns, anti-patterns, and limitations:
- Layers with similar contents tend to be created in both the cache and launch directories.
- Buildpacks tend to build abstractions that persist these layers to either or both of the directories depending on whether they are needed by buildpack code or by app code.
- Buildpacks use conventions in the build plan to indicate whether a layer should be provided to other buildpacks via the cache or made available at launch-time (see #22).
- When a layer is present in both cache and launch directories, buildpack code is often complex because of the different possible starting states of the cache, launch, and app directories (metadata vs. no-metadata, cached vs. not cached, vendored vs. not vendored).
- In some cases, ensuring that entire layers from the last build are restored may be desirable (see #8).
- Occasionally, buildpacks need to cache dependencies that shouldn’t be accessible to other buildpacks. It seems reasonable to decouple these concepts.
- Cached layers often have the same metadata as corresponding launch layers.
- It is currently impossible for a
/bin/buildscript to make a symlink from the app directory to a layer that remains unbroken during both build and launch. Separately, relative symlinks from layers to the app directory are easy to break when copying layer directories between cache and launch dirs.
Proposal
Instead of providing separate launch and cache directories to each buildpack, we could provide a single <layers> directory with the following extra, top-level fields in each <layer>.toml:
launch = false # layer is available at launch time (default: false)
build = false # layer is available to subsequent buildpacks (default: false)
cache = false # restore layer contents on the next build (default: false)
persist-cache = false # guarantee that layer contents are recovered (default: false, ignored if !cache)
[metadata] # all user-provided metadata now nested here
Rules:
- Metadata written to
<layer>.tomlis always restored. - If
launch && cache && !persist-cache, but the local layer contents do not match the remote layer contents, then the cached local layer is deleted before the build and the remote metadata is provided. This ensures that recovered local layers are never out of sync with remote layers. - If
!launch && cache && persist-cache, the build fails unless/until we decide to support a persistent cache that isn’t used in the remote image. - If a layer changes from
launchto!launch, then the remote layer is deleted. - A platform may choose to cache layers locally when
cache && persist-cacheas long as the cached layers are only restored when they are identical to the remote launch layers. - To guarantee consistent behavior between builds,
!buildlayers should always be moved such that they are inaccessible to subsequent buildpacks.
The combined <layers> directory would continue to look like this:
my.buildpack.id/my-layer/ # directory contents
my.buildpack.id/my-layer.toml # metadata for my-layer
...
The new interface would be:
Executable: /bin/build <platform[AR]> <layers[EI]>, Working Dir: <app[AI]>
| Input | Description |
|---|---|
/dev/stdin |
Build plan from detection (TOML) |
<platform>/env/ |
User-provided environment variables for build |
<platform>/# |
Platform-specific extensions |
| Output | Description |
|---|---|
| [exit status] | Success (0) or failure (1+) |
/dev/stdout |
Logs (info) |
/dev/stderr |
Logs (warnings, errors) |
<layers>/launch.toml |
Launch metadata (see File: launch.toml) |
<layers>/<layer>.toml |
Layer content metadata |
<layers>/<layer>/bin/ |
Binaries for subsequent buildpacks and/or launch |
<layers>/<layer>/lib/ |
Shared libraries for subsequent buildpacks and/or launch |
<layers>/<layer>/profile.d/ |
Scripts sourced by bash before launch |
<layers>/<layer>/include/ |
C/C++ headers for subsequent buildpacks |
<layers>/<layer>/pkgconfig/ |
Search path for pkg-config for subsequent buildpacks |
<layers>/<layer>/env/ |
Env vars for launch/build, set before env.build or env.run |
<layers>/<layer>/env.build/ |
Env vars for set for subsequent buildpacks |
<layers>/<layer>/env.run/ |
Env vars for set before profile.d scripts are sourced |
<layers>/<layer>/* |
Other content for subsequent buildpacks and/or launch |
READ THIS: New Behavior Introduced:
- Clearing the cache when the local layer contents don’t match the remote layer contents means that “stale” launch layers are never re-used. For example, this means that a Node.js build that jumps back and forth between two VMs would sometimes need to rebuild the node modules from scratch, even if they are cached on both VMs. This means that buildpacks need to perform objectively fewer metadata comparisons for the same effect (not just less copying). However, it is less efficient. I’m okay with this behavior change, because the previous behavior is easy to replicate with two layers if necessary (and it requires the exact same logic). In addition, this new behavior is safer and more similar to the current v2a/b globally-persistent buildpack cache behavior. Another way to understand this change is “locally-recovered launch layers must always match the previous build.”
persist-cacherequires downloading layers from the registry before the build, but provides a guaranteed, globally-persistent cache.- The
<layers>/<layer>/env/directory is split into<layers>/<layer>/env/build/and<layers>/<layer>/env/run/to make the behavior of the directory more clear and to provide a safer, more convenient way to set runtime environment variables.
Advantages:
- Buildpack code would be less complex
- Paths to the same dependency would always be the same
- Symlinks from the app dir to the layer would remain valid for build and launch
- Relative symlinks from the layers to the app dir would remain valid for build and launch
- Local disk usage would be reduced by as much as 50%
- Possibly safer due to guarantee that data intended for launch always matches the previous build
Disadvantages:
- Slightly more TOML writing needed (when providing dependencies to subsequent buildpacks)
- More complicated for the lifecycle to implement
- Less efficient when a “cached-launch” layer is used instead of a separate cache and launch layers because “stale” launch layers are never re-used
- Less efficient if buildpacks use
persist-cachedue to registry downloading
Possible Extensions:
- We could allow
!launch && cache && persist-cachein the future using a dedicated image repository (or tag in the same repository). This would be entirely backwards-compatible when introduced.
Thoughts? If desired, we should make this change before ratifying v3.0.0 of the specification.
@nebhale @jkutner @hone @ekcasey @dgodd @jchesterpivotal @ameyer-pivotal
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 16 (14 by maintainers)
I am very excited about the simplification this allows for buildpack authors, and removing their code to copy between cache and launch layers.
I don’t personally object to
cache-from = "launch", but was surprised at your including it, since:cache-from = "launch", this spec appears to include all of the abilities of the current specification but with easier use for buildpack authorsMy feeling is to try and get some benchmarks for performance of
cache-from = "launch"and consider that tiny portion of this spec based on that.Everything else looks great to me.
Update: After considering the different combinations of options and applying to them to various buildpack use cases, I think we can reduce the four options to just three:
launch = true- for launchbuild = true- for subsequent buildpackscache = true- recover layer from last buildWhere:
launch && cache, the layer is always restored (either from the VM or remote layer)build && cache, the layer is the last copy available locallylaunch && !cache, only metadata is restoredbuild && !cache, the layer is not keptThis should further reduce buildpack complexity.