compose: Can't access GPU during build with docker compose v2

Description

Accessing the GPU during build using Docker Compose v2 doesn’t work.

It does work when the container is running, but some of my build steps need the GPU for compilation with cuda.

It doesn’t seem to work using either runtime/resources flags as described here

This does work using docker compose v1.

Steps to reproduce the issue:

  • docker-compose v2 doesn’t build

The attached yml + Dockerfile fail with an AssertionError.

docker compose build nvidia-test docker compose build nvidia-test-2

  • docker-compose v1 works

Running with docker-compose v1 installed via pip, the attached yml and Dockerfiles run successfully. docker-compose build nvidia-test docker-compose build nvidia-test-2

Output of docker compose version:

v2

docker compose version
Docker Compose version v2.6.0

v1

docker-compose version
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.3
CPython version: 3.9.4
OpenSSL version: OpenSSL 1.1.1k  25 Mar 2021

Output of docker info:

Client:                                                  
 Context:    default                                                                                              
 Debug Mode: false                                                                                                
 Plugins:                                                                                                         
  app: Docker App (Docker Inc., v0.9.1-beta3)                                                                     
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)                                                              
  compose: Docker Compose (Docker Inc., v2.6.0)                                                                   
  scan: Docker Scan (Docker Inc., v0.17.0)                                                                        
                                                                                                                  
Server:                                                                                                           
 Containers: 34                                          
  Running: 2                                             
  Paused: 0            
  Stopped: 32
 Images: 31                                                                                                       
 Server Version: 20.10.17                                
 Storage Driver: overlay2                                                                                         
  Backing Filesystem: extfs                              
  Supports d_type: true 
  Native Overlay Diff: true
  userxattr: false                                       
 Logging Driver: json-file                                                                                        
 Cgroup Driver: cgroupfs                                                                                          
 Cgroup Version: 1                                                                                                
 Plugins:         
  Volume: local                                          
  Network: bridge host ipvlan macvlan null overlay       
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog                             
 Swarm: inactive                                         
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia                                       
 Default Runtime: nvidia                                 
 Init Binary: docker-init                                
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.15.0-1015-aws
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.34GiB
 Name: ip-172-31-33-172
 ID: 7QW3:4AFO:BJBD:IH6R:IXVA:WWW2:Z5EL:HRH4:E4Y4:MFZD:KUWE:VH75
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Dockerfile

FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

RUN python -c "import torch;assert torch.cuda.is_available()"

docker-compose.yml

version: "3.9"

services:
  nvidia-test:
    build: ./
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]

  nvidia-test-2:
    build: ./
    runtime: nvidia

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (2 by maintainers)

Most upvoted comments

This isn’t really a solution. I want to use buildkit, it provides cache volumes which speed up builds a lot.

Right now I’m building with docker-compose and running the containers with docker compose, works for now.

I’m also having this problem and disabling BuildKit by DOCKER_BUILDKIT=0 solves this strange problem for me. Isn’t there any other way to fix this?

DOCKER_BUILDKIT=0 solves this issue for me as well though it would be nice to have a reference in the documentation for it.

@danielgafni Are you saying that by using docker-compose this problem can be averted and we can ALSO use buildkit ?

Can you try running without buildkit and see if the result is any different?

DOCKER_BUILDKIT=0 docker compose build nvidia-test

We experience the same issue. This is currently holding us back from making the transition to compose v2 and the cli plugin.