concourse-docker: "failed to create volume", Concourse running in docker-compose on Linux

I’ve got Concourse running on a NixOS 18.03 VPS inside docker-compose, and this is working fine. I’m now trying to deploy exactly the same Concourse configuration to another NixOS 18.03 machine, but aren’t having any luck. I’m using the same docker-compose file, and the same pipelines.

The new machine gives errors about being unable to create volumes:

Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1  | {"timestamp":"2019-04-12T20:55:49.753780802Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.find-or-create-cow-volume-for-container.failed-to-create-volume-in-baggageclaim","data":{"container":"af97f489-2d27-4007-57b4-e5cb9c43e659","error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1.3","team":"main","volume":"e843e1a7-4122-494b-5397-d0a94294e418"}}
Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1  | {"timestamp":"2019-04-12T20:55:49.793734883Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-fetch-image-for-container","data":{"container":"af97f489-2d27-4007-57b4-e5cb9c43e659","error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1","team":"main"}}
Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1  | {"timestamp":"2019-04-12T20:55:49.794088237Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-initialize-new-container","data":{"error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1","team":"main"}}

The concoursefiles-git resource it’s failing to create a volume for there is a normal git resource. The other resources in the pipeline are failing with the same error.

The pipeline is here: https://github.com/barrucadu/concoursefiles/blob/master/pipelines/ci.yml

This is the docker-compose file:

version: '3'

services:
  concourse:
    image: concourse/concourse
    command: quickstart
    privileged: true
    depends_on: [postgres, registry]
    ports: ["3003:8080"]
    environment:
      CONCOURSE_POSTGRES_HOST: postgres
      CONCOURSE_POSTGRES_USER: concourse
      CONCOURSE_POSTGRES_PASSWORD: concourse
      CONCOURSE_POSTGRES_DATABASE: concourse
      CONCOURSE_EXTERNAL_URL: "https://ci.nyarlathotep.barrucadu.co.uk"
      CONCOURSE_MAIN_TEAM_GITHUB_USER: "barrucadu"
      CONCOURSE_GITHUB_CLIENT_ID: "<omitted>"
      CONCOURSE_GITHUB_CLIENT_SECRET: "<omitted>"
      CONCOURSE_LOG_LEVEL: error
      CONCOURSE_GARDEN_LOG_LEVEL: error
    networks:
      - ci

  postgres:
    image: postgres
    environment:
      POSTGRES_DB: concourse
      POSTGRES_PASSWORD: concourse
      POSTGRES_USER: concourse
      PGDATA: /database
    networks:
      - ci
    volumes:
      - pgdata:/database

  registry:
    image: registry
    networks:
      ci:
        ipv4_address: "172.21.0.254"
        aliases: [ci-registry]
    volumes:
      - regdata:/var/lib/registry

networks:
  ci:
    ipam:
      driver: default
      config:
        - subnet: 172.21.0.0/16

volumes:
  pgdata:
  regdata:

I’m using the latest concourse/concourse image, as I set this up today. The version of docker is 18.09.2 (build 62479626f213818ba5b4565105a05277308587d5). What can I look at to help debug this?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Yeah that seems to work.

  1. Create zvol with ext4
zfs create -V 10g rpool/concourse-workdir0-ext4
mkfs.ext4 /dev/zvol/rpool/concourse-workdir0-ext4
  1. Configure NixOS to mount it at /mnt/concourse-workdir0

Into configuration.nix, add:

fileSystems."/mnt/concourse-workdir0" = {
  device = "/dev/zvol/rpool/concourse-workdir0-ext4";
  fsType = "ext4";
};
  1. Configure worker to use given workdir
  • (bind?) mount host’s /mnt/concourse-workdir0 to worker container’s /workdir
  • set CONCOURSE_WORK_DIR to /workdir

We are seeing this error very frequently in the Spring Boot builds. We are running v5.7.2 on bosh-vsphere-esxi-ubuntu-xenial-go_agent 621.29 stemcell, using the overlay driver.

In web.stdout.log we have:

{"timestamp":"2020-01-23T15:35:59.923944142Z","level":"error","source":"atc","message":"atc.tracker.track.task-step.find-or-create-volume-for-container.failed-to-create-volume-in-baggageclaim","data":{"build":104720,"error":"failed to create volume","job":"build-pull-requests","job-id":2744,"pipeline":"spring-boot-2.3.x","session":"19.62686.7.31","step-name":"build-project","volume":"999ba5a8-f8a1-4e5d-5087-c5e3974e15e1"}}

In worker.stdout.log we see the baggageclaim error:

{"timestamp":"2020-01-23T15:35:55.212121511Z","level":"error","source":"baggageclaim","message":"baggageclaim.api.volume-server.create-volume-async.create-volume.failed-to-materialize-strategy","data":{"error":"exit status 1","handle":"999ba5a8-f8a1-4e5d-5087-c5e3974e15e1","session":"3.1.999394.1"}}
{"timestamp":"2020-01-23T15:35:55.299415431Z","level":"error","source":"baggageclaim","message":"baggageclaim.api.volume-server.create-volume-async.failed-to-create","data":{"error":"exit status 1","handle":"999ba5a8-f8a1-4e5d-5087-c5e3974e15e1","privileged":false,"session":"3.1.999394","strategy":{"type":"import","path":"/var/vcap/data/worker/work/volumes/live/24ae1aac-852c-4c5c-414d-29088119c8a3/volume","follow_symlinks":false}}}

The error arrives at the end of builds. The pipelines use https://concourse-ci.org/tasks.html#task-caches to cache dependencies between runs. https://github.com/spring-projects/spring-boot/blob/89237634c7931f275ddbddba176c7a826b1667cb/ci/tasks/build-project.yml#L7 When we query the volumes table by handle, we can confirm no record was created for 999ba5a8-f8a1-4e5d-5087-c5e3974e15e1.

We considered underlying server load, so enabled container-placement-strategy-limit-active-tasks, which distibuted things nicely (thank you!). Now that load seems fine, it is mainly the Spring Boot pipelines that have this issue in our multi-tenant https://ci.spring.io.

We can re-recreate all of the workers to make the issue go way for a few days, but it eventually comes back. We see a clear pattern of the error re-surfacing after a number of green builds. reported in #concourse-operations.

@barrucadu setting:

CONCOURSE_WORK_DIR=/worker-state
CONCOURSE_WORKER_WORK_DIR=/worker-state

and adding a volume for the /worker-state directory in my worker’s service configuration was necessary for baggageclaim to create volumes.