compose: Can't access GPU during build with docker compose v2
Description
Accessing the GPU during build using Docker Compose v2 doesn’t work.
It does work when the container is running, but some of my build steps need the GPU for compilation with cuda.
It doesn’t seem to work using either runtime/resources flags as described here
This does work using docker compose v1.
Steps to reproduce the issue:
- docker-compose v2 doesn’t build
The attached yml + Dockerfile fail with an AssertionError.
docker compose build nvidia-test
docker compose build nvidia-test-2
- docker-compose v1 works
Running with docker-compose v1 installed via pip, the attached yml and Dockerfiles run successfully.
docker-compose build nvidia-test
docker-compose build nvidia-test-2
Output of docker compose version
:
v2
docker compose version
Docker Compose version v2.6.0
v1
docker-compose version
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.3
CPython version: 3.9.4
OpenSSL version: OpenSSL 1.1.1k 25 Mar 2021
Output of docker info
:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
compose: Docker Compose (Docker Inc., v2.6.0)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 34
Running: 2
Paused: 0
Stopped: 32
Images: 31
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
runc version: v1.1.2-0-ga916309
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.15.0-1015-aws
Operating System: Ubuntu 20.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.34GiB
Name: ip-172-31-33-172
ID: 7QW3:4AFO:BJBD:IH6R:IXVA:WWW2:Z5EL:HRH4:E4Y4:MFZD:KUWE:VH75
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Dockerfile
FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime
RUN python -c "import torch;assert torch.cuda.is_available()"
docker-compose.yml
version: "3.9"
services:
nvidia-test:
build: ./
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
nvidia-test-2:
build: ./
runtime: nvidia
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 16 (2 by maintainers)
This isn’t really a solution. I want to use buildkit, it provides cache volumes which speed up builds a lot.
Right now I’m building with
docker-compose
and running the containers withdocker compose
, works for now.I’m also having this problem and disabling BuildKit by
DOCKER_BUILDKIT=0
solves this strange problem for me. Isn’t there any other way to fix this?DOCKER_BUILDKIT=0
solves this issue for me as well though it would be nice to have a reference in the documentation for it.@danielgafni Are you saying that by using
docker-compose
this problem can be averted and we can ALSO use buildkit ?Can you try running without buildkit and see if the result is any different?
We experience the same issue. This is currently holding us back from making the transition to compose v2 and the cli plugin.