buildkit: Unable to use Buildkit with Windows containers

I’m using the Buildkit version that comes bundled with Docker for Windows 18.06.1 and am experiencing some trouble running it with Windows containers. In the log below you can see a build succeed for a very simple build running without Buildkit and then failing once I enable it. The localized error message “Det går inte att hitta filen” roughly translates to “Unable to find the file”. I’ve had success running Buildkit on the same system when running Linux containers. A minimal project that reproduces the error can be found here test.zip.

PS C:\test> docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:34 2018
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.24)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:36:40 2018
  OS/Arch:          windows/amd64
  Experimental:     true
PS C:\test> ls


    Directory: C:\test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2018-09-11     15:38             74 Dockerfile
-a----       2018-09-11     15:39             23 test.txt


PS C:\test> type .\Dockerfile
FROM microsoft/nanoserver:1803
COPY test.txt /test.txt
RUN type test.txt

PS C:\test> $Env:DOCKER_BUILDKIT=0
PS C:\test> docker build -t test .
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM microsoft/nanoserver:1803
 ---> 693ff1719e39
Step 2/3 : COPY test.txt /test.txt
 ---> 3cb8bc9e5e2e
Step 3/3 : RUN type test.txt
 ---> Running in 376f873629fd
This is a test message!Removing intermediate container 376f873629fd
 ---> 0cce47564a2d
Successfully built 0cce47564a2d
Successfully tagged test:latest

PS C:\test> $Env:DOCKER_BUILDKIT=1
PS C:\test> docker build -t test .
[+] Building 0.2s (2/2) FINISHED
 => local://dockerfile (Dockerfile)                                                                                                                                                                                                                                       0.1s
 => => transferring dockerfile: 31B                                                                                                                                                                                                                                       0.0s
 => local://context (.dockerignore)                                                                                                                                                                                                                                       0.1s
 => => transferring context: 2B                                                                                                                                                                                                                                           0.0s
failed to read dockerfile: open C:\ProgramData\Docker\tmp\buildkit-mount977689469\Dockerfile: Det går inte att hitta filen.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 32
  • Comments: 77 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Any plans to support it?

When is buildkit support coming for windows?

The failure I hit in my previous run turned out to be a bug in hcsshim, for which I have posted a fix at microsoft/hcsshim#752.

So now I am able to build a trivial Dockerfile. So trivial it’s pointless, except that it worked.

FROM mcr.microsoft.com/windows/servercore:1909
LABEL Description="Built with BuildKit!"
SHELL ["powershell", "-command"]
RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1
CMD ["powershell" ".\wr1.ps1"]

I don’t know yet if my containers do not have networking set up properly due to my Buildkit spec-generation hacks, or some other aspect of my setup unrelated to Buildkit.

As well as networking issues, filesystem commands do not function on Windows due to an assertion about idmapping support.

I was worried about API issues, so I had vendored containerd master into buildkit, and hcsshim master into containerd. However, I suspect that this wasn’t necessary, and I’ll back those out next time I look at this.

I’ve rebased https://github.com/TBBle/buildkit/tree/hacks_ahoy to the current version of #1314, so it should be relatively easy for anyone who wants to try this out, and perhaps try and turn some of my hacks into further valuable commits.

I have drafted #4387 which fixes use of FROM mcr.microsoft.com/powershell:latest for example, as the only example I tested. It should fix all Windows multi-arch images using FROM, and also pre-fix any future surprise corner cases like unexpected cache layer hits on different OS versions.

It probably doesn’t, but only because all the file-copy APIs in BuildKit fail an assertion on Windows related to permissions support.

I really should get back to this, it got jammed up behind questions about containerd 1.2 support, and then other stuff came up.

Buildkit is not supported for Windows containers in docker 18.06/18.09

Thank you. I have that part working, or had last time I tried it.

My current stall is because I’m waiting for https://github.com/Microsoft/hcsshim/pull/901 to land so I can (hopefully) land https://github.com/containerd/containerd/pull/4419 once I work out where and how to avoid the Server 2019 breakage in the snapshotter tests, so that I can then have a working containerd that’s usable for CI of BuildKit, so that I can work on Buildkit without having to worry about changes and fixes being reverted or subverted because there’s no CI for Windows with containerd.

The actual time I spend on this (and it’s infrequent recently) has been trying to reproduce the breakage in https://github.com/containerd/containerd/pull/4419 directly in https://github.com/Microsoft/hcsshim (since I’m 90% sure it’s an OS-level thing, not the containerd code), and/or find a working workaround. I actually haven’t yet (but need to) investigate if Windows Server 2022 shows the same issue, because if not, then at least it’s not a going-forward blocker.

Small progress report. I now have networking functional for the containerd worker under Windows. It’s a minor hassle to set up using BuildKit and containerd directly (as you have to source and configure a CNI plugin yourself, and the Windows CNI landscape is… rough), but Docker provides its own managed network stack to use with BuildKit, so once someone implements the Docker side of the Buildkit integration, it won’t be any more hassle than networking under any other setup.

No containerd changes this time, as containerd happily uses whatever CNI setup you pass it.

I now have the below functioning, see #1585 for details.

FROM mcr.microsoft.com/windows/servercore:2004

LABEL Description="Python" Vendor="Python Software Foundation" Version="3.7.3"

RUN powershell.exe -Command \
    $ErrorActionPreference = 'Stop'; \
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; \
    wget https://www.python.org/ftp/python/3.7.3/python-3.7.3.exe -OutFile c:\python-3.7.3.exe ; \
    Start-Process c:\python-3.7.3.exe -ArgumentList '/quiet InstallAllUsers=1 PrependPath=1' -Wait ; \
    Remove-Item c:\python-3.7.3.exe -Force

So with #3518 landed 🎉, @gabriel-samfira 's list of existing patches-to-land appears complete, and with one minor fix (#4364), I was able to build a reasonably trivial Dockerfile using BuildKit master branch and containerd 1.7.7, and then execute it with nerdctl 1.6.7 (which also turns out to need some fixes, but they’re only cosmetic). (Edit: I just remembered that I also had updated the Windows CNI plugins to support CNI 1.0.0 for nerdctl, but that was probably also necessary for BuildKit; I got nerdctl working before I started with buildkit)

A super simple Dockerfile
FROM mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022
LABEL Description="Built with BuildKit!"
WORKDIR C:/
SHELL ["C:\\Program Files\\PowerShell\\pwsh.exe", "-command"]
ENTRYPOINT ["C:\\Program Files\\PowerShell\\pwsh.exe"]
RUN dir C:/
RUN echo 'Write-Host -ForegroundColor DarkGreen Hello World' > C:/wr.ps1
RUN echo 'Write-Host -ForegroundColor DarkBlue Hello World' > C:/wrblue.ps1
CMD ["-command", "C:/wr.ps1"]
buildctl build --frontend dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=docker.io/tbble/supersimpleDocker
nerdctl run -it --rm docker.io/tbble/supersimpleDocker
nerdctl run -it --rm docker.io/tbble/supersimpleDocker -command C:/wrblue.ps1

There’s still a few rough spots that need fixing:

  • containerd vendoring needs to be updated, right now we don’t have the Stable ABI support in the platform matchers, so Windows 11 hosts can’t select the correct (LTSC2022) image from a manifest list. Workaround is to use non-manifest list refs, so not super-painful but breaks a lot of existing examples. Edit: We’re already vendoring containerd 1.7.7, so I’m not sure precisely what’s wrong here. (ctr run and nerdctl run are both fine with mcr.microsoft.com/windows/nanoserver:ltsc2022 but BuildKit can’t find a matching platform on my Windows 11 host when I FROM it).

  • Something is mangling the PATH, as you can see in my above Dockerfile, I have to give the full path to pwsh.exe. ctr run does not need the full path to pwsh.exe for the base image, but does for the resulting image, and during the build, BuildKit can’t find pwsh.exe without the full path. nerdctl has a similar issue, it can’t run even the base image without specifying the full path. So this one’s annoying, breaks existing images, and I think there’s a ticket open for it already, although it may be on nerdctl. Edit: Turns out I was thinking of #3158, which points to (amongst other things) earlier comments in this ticket, and includes a full description (well written by a very handsome author, if I do say so myself) of the underlying issue here.

While I was at it, I got nerdctl build trivially working. Mostly just removing the “not supported on Windows” check: https://github.com/containerd/nerdctl/pull/2587


I was wondering if it was time to start running-up the integration test suite on Windows. However, I noticed that the existing integration test suite relies on running inside a container, and that’s probably a non-starter for Windows unless we want to blow through into Host Process mode. (Which is actually probably the right long-term approach, but I don’t know if that works with Docker, or on the GHA hosted Windows runners.)

Potentially, running buildkitd and containerd outside a container, and buildctl, nerdctl, and the rest of the test suite-used utilities inside a container would be doable. I have a vague recollection that mapping named pipes into a Windows container should work, and I don’t think buildctl relies on mounting container images locally or anything else that would be hard inside a regular WCOW container.

We also have a bit of a circular dependency for running up the integration-test image anyway, since it relies on BuildKit (and the build process relies on buildx and bake as well) so I believe we’re going to have to (temporarily, at least) have a separate parallel GHA test setup and a separate Dockerfile.windows to build the equivalent of the integration-test container image.

With #1314, and some more hacking on things, I’ve gotten to the point where my next failure is coming from inside containerd, or the connection to it.

PS C:\Users\paulh\Documents\BuildKit\supersimpleDocker> buildctl --debug build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
time="2020-01-06T08:03:16+11:00" level=debug msg="serving grpc connection"
[+] Building 4.7s (4/5)
[+] Building 4.7s (5/5) FINISHED
 => [internal] load build definition from Dockerfile                                                                     0.0s  => => transferring dockerfile: 588B                                                                                     0.0s  => [internal] load .dockerignore                                                                                        0.0s  => => transferring context: 2B                                                                                          0.0s  => [internal] load metadata for mcr.microsoft.com/windows/servercore:1909                                               0.2s  => CACHED [1/2] FROM mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46  0.0s  => => resolve mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46359bf0a  0.0s  => ERROR [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1                                                  4.4s ------
 > [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1:
------
error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [powershell -command echo Write-Host -ForegroundColor Red Hello > wr.ps1]: failure waiting for process: rpc error: code = Unknown desc = ttrpc: closed: unknown
failed to solve
github.com/moby/buildkit/client.(*Client).solve.func2
        C:/Users/paulh/go/src/github.com/moby/buildkit/client/solve.go:203
github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
        C:/Users/paulh/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
        c:/go/src/runtime/asm_amd64.s:1357

I’ve pushed one commit that needs more work (breaks the auto tests) plus my hacks onto https://github.com/TBBle/buildkit/tree/hacks_ahoy, in case anyone else wants to play with this.

For reference, I was working with source from containerd/containerd#3929, to fix a blocking bug and Microsoft/hcsshim#749, to let me build without gcc. For hcshim, had I not been instrumenting the source, I could have used the nightly binary build of the containerd shim, and I’m planning to suggest/submit that their releases include pushing a container for the container managed /opt feature, which would avoid hunting down binaries and adding them to the $PATH. (Edit: Microsoft/hcsshim#750)

@Iristyle that is probably possible but this issue is about real Windows containers so let’s try keep on topic.

Could someone explain in more detail what are the actual technical problems of not being able to run buildkitd inside the containers.

We can’t say for sure. I can’t, at least. Not until we actually try it. At this point we’re just guessing based on previous experience in other parts of the ecosystem. My hope is that it will work. If not in process containers, then at least in hyper-v vontainers.

We’ll know more in the following weeks, and will add more relevant details and/or PRs to enable tests as well as the rest of the ecosystem tooling. The aim is to be as close as possible to the linux version in terms of UX.

I don’t think you can parallel-extract layers in an image, because each layer is a set of changes against the layer below, and for example, if a higher layer processes a ‘delete’ for a particular file or directory, the extractors of the lower level must then track that this happened and skip those directories.

At least for Windows Container Layers. It’s possible on Linux that the layer storage is done by some other mechanism that would let you extract layers in parallel, but I’m not that familiar with the Linux container storage implementation. (I don’t see such a code-path in the downloader though.)

I’m not sure if we do checksumming during extraction? Checksumming should happen when the layer is pulled and written to the content store, and that would also be in parallel as the data being downloaded will be piped through the checksummer in parallel to being written to disk, I expect.

Edit: I checked this, the digest is of the uncompressed data, so that’s done during extraction after all. So to parallelise checksumming, we’d have to separately decompress-and-checksum each layer so it could be done in parallel, which seems like a waste of CPU and IO bandwidth when right now we get checksumming for free as part of extracting the layer anway.

Ah, thank you. moby/moby#38541 is the PR reference I was looking for earlier.

Poking through, containerd doesn’t seem to publish Windows binaries in their releases despite having thew new Windows V2 runtime in their 1.3.0 release, and their AppVeyor build pipeline doesn’t capture artifacts.

The required hcsshim project does publish artifacts from their AppVeyor pipeline, even though they don’t include them in their releases.

Both have recent-enough releases to meet the criteria laid out in moby/moby#38541 but they both also have active work on master which might make a difference.

containerd currently vendors a specific commit of hcsshim (Microsoft/hcsshim@d2849cbdb9dfe5f513292a9610ca2eb734cdd1e7), binaries for which can be fetched from AppVeyor. For containerd 1.3.2 (Microsoft/hcsshim@9e921883ac929bbe515b39793ece99ce3a9d7706) the binaries are also on AppVeyor but will expire in late February. Both of these vendored versions are older than the current hcsshim release, 0.8.7, whose artifacts are also on AppVeyor.

In the end, it’s not clear to me if this ecosystem is yet in a state to start trying to get BuildKit working, and containerd/containerd#1920 (which has not been updated since the switch to the Windows V2 API) gives me a reasonable level of doubt.

Since the core of the system is roughly working in master, and AFAIK all the upstream dependencies have released versions we can use, we probably should set some goals for closing this ticket and tracking remaining work that needs further discussion new tickets.

First question, do we want to keep this ticket around as a meta-tracking ticket? I suspect a lot of people are subscribed and would see this ticket closing as “It works”. It makes sense to me to keep using this ticket to track until the feature is release-notable.

I’d love to see WCOW land as supported in 0.13, but feel #3158 and the Platform Matcher issue for Windows 11 should be resolved first, as they represent regressions from the legacy builder in dockerd for common existing Dockerfile patterns. We also need test suite coverage, to identify any other regressions.

It just occurred to me that it might be worth collecting a list of large WCOW-based containers and do some test-builds with them to shake out any other regressions. Since I have history with it, ue4-docker comes immediately to mind. I don’t think my own machine is strong enough for it. (It is probably also going to be bitten by #3158, since it uses RUN powershell frequently.) Core MS tools like PowerShell-Docker and dotnet-docker.

We probably should test and document the state of HyperV Isolation support. It’ll be interesting for people on Windows 11 and Windows Server 2022 hosts to build Windows Server 2019 containers, but whether those people are numerous enough to make it a release goal, I’m unsure. dotnet-docker is also a test-case for this, they appear to still support LTSC2019 and I think we don’t plan to support Windows Server LTSC 2019 as host for buildkitd. (But now I’m questioning that, did I confuse it with Docker 24? Or with LTSC2016 support?)

I don’t think LCOW is a release goal here. Although it might be easier to get the test suite running on that, there’s probably a bunch of things that are making WCOW assumptions in BuildKit, e.g., the Platform Matcher. And similarly, multi-platform build support probably isn’t interesting right now. (WCOW/LCOW would be doable once we have LCOW, I’m not sure if multi-architecture on top of that would be fun to implement, it’d probably be QEMU inside LCOW for Linux, and multi-arch Windows Containers is simultaneously ancient history and unknowable future)

I’m not sure what we’d need in terms of documentation. Presumably documentation of the various Windows-specific limitations is the bare minimum.

And then there’s trivial stuff like moving the buildkitd binary into the binaries image and anything else needed to make the released artifact usable. (I hope we don’t need to do an installer here. That seems like a bundler issue? nerdctl wants an installer for their “Windows supported” milestone, which would include buildkit for example.)

CC-in from the Microsoft side for greater visibility @lucillex @profnandaa @iankingori

I am not (yet) confident that the tests won’t check for the existence of files in the path where layers get mounted

They do not afaik, the data is exported with --output to registry/local/containerd and then checked.


Could someone explain in more detail what are the actual technical problems of not being able to run buildkitd inside the containers. This isn’t just for tests but I also want it to be possible to run buildx create to run any upstream release of buildkit as an isolated instance. In linux, by default this means making a buildkit container. I’m also not sure atm if the frontend containers work in wcow or not. That one is a slightly different problem though as frontends do not require any extra privileges.

Ah, good point. The tests would be doing the mounting themselves so they don’t need to see the exact same filesytem as buildkitd, but need access to the same containerd backing store. I see now. That’d probably also be true on Linux in the same situation, i.e. we were trying to run tests in a non-privileged container talking to buildkitd/containerd in a separate container.

Ah, is the local mounting done by buildctl, not buildkitd?

I think it is done by buildkitd, but the issue is a mixed bag. I am not (yet) confident that the tests won’t check for the existence of files in the path where layers get mounted (see continuity/fstest). Starting next week we will finally have time to start tackling the test suite.

This is being worked on here: https://github.com/microsoft/Windows-Containers/issues/34 and it seems there is some potential solution which is awaiting review (https://github.com/moby/buildkit/pulls/gabriel-samfira).

In the meantime, in that same issue, it’s mentioned that you can build Windows images already as long as you don’t include RUN statements, see: https://github.com/microsoft/Windows-Containers/issues/34#issuecomment-653215478

Though not using Dockerfiles or docker buildx, I did a quick PoC a while ago with Crane. It lets you assemble a Window container image on Linux containers but you need to have the EXEs/DLLs prebuilt on a Windows machine (similar to the approach above), see: https://gist.github.com/clarenceb/269c8bc69ea47b0022a34605844b531b

@TBBle would also be nice to be able to build with buildx + kubenretes driver on kubernetes windows nodes. Would that work too?

@TBBle again, thanks a lot for looking into this!

Yeah, the Windows Containers filesystem performance particularly when importing/exporting layers isn’t wonderful, due to a number of hoops it has to jump through between Windows and OCI formats. That’s nothing to do with BuildKit though, that’s generally Docker Engine or Containerd’s responsibility, and both are mostly limited by the underlying systems.

One often-overlooked trick is if you have unpigz.exe in the daemon’s path, e.g., from https://blog.kowalczyk.info/software/pigz-for-windows.html, decompression of large layers will be a lot faster.

Future TODO: Once https://github.com/microsoft/Windows-Containers/issues/31 is resolved, we can override C:\windows\system32\drivers\etc\hosts on Windows like we do on Linux.

@TBBle The vendored containerd does not need to be stable release. We mostly vendor master to get the latest fixes. For the differ, I doubt the current windows-layers thing is usable. It is just for handling the different tar format(windows has a parent Hives/Files directories). Opened an issue to support it natively in https://github.com/containerd/containerd/issues/2469 as well so we don’t need a hack. It would be nice if we could do the opposite as well(build Linux layers in windows) but that is not a priority atm of course.

@olljanat I meant about the error message from the built process.