concourse-docker: "failed to create volume", Concourse running in docker-compose on Linux
I’ve got Concourse running on a NixOS 18.03 VPS inside docker-compose, and this is working fine. I’m now trying to deploy exactly the same Concourse configuration to another NixOS 18.03 machine, but aren’t having any luck. I’m using the same docker-compose file, and the same pipelines.
The new machine gives errors about being unable to create volumes:
Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1 | {"timestamp":"2019-04-12T20:55:49.753780802Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.find-or-create-cow-volume-for-container.failed-to-create-volume-in-baggageclaim","data":{"container":"af97f489-2d27-4007-57b4-e5cb9c43e659","error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1.3","team":"main","volume":"e843e1a7-4122-494b-5397-d0a94294e418"}}
Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1 | {"timestamp":"2019-04-12T20:55:49.793734883Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-fetch-image-for-container","data":{"container":"af97f489-2d27-4007-57b4-e5cb9c43e659","error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1","team":"main"}}
Apr 12 21:55:49 nyarlathotep docker-compose[26088]: concourse_1 | {"timestamp":"2019-04-12T20:55:49.794088237Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-initialize-new-container","data":{"error":"failed to create volume","pipeline":"ci","resource":"concoursefiles-git","session":"18.1.4.1.1","team":"main"}}
The concoursefiles-git resource it’s failing to create a volume for there is a normal git resource. The other resources in the pipeline are failing with the same error.
The pipeline is here: https://github.com/barrucadu/concoursefiles/blob/master/pipelines/ci.yml
This is the docker-compose file:
version: '3'
services:
concourse:
image: concourse/concourse
command: quickstart
privileged: true
depends_on: [postgres, registry]
ports: ["3003:8080"]
environment:
CONCOURSE_POSTGRES_HOST: postgres
CONCOURSE_POSTGRES_USER: concourse
CONCOURSE_POSTGRES_PASSWORD: concourse
CONCOURSE_POSTGRES_DATABASE: concourse
CONCOURSE_EXTERNAL_URL: "https://ci.nyarlathotep.barrucadu.co.uk"
CONCOURSE_MAIN_TEAM_GITHUB_USER: "barrucadu"
CONCOURSE_GITHUB_CLIENT_ID: "<omitted>"
CONCOURSE_GITHUB_CLIENT_SECRET: "<omitted>"
CONCOURSE_LOG_LEVEL: error
CONCOURSE_GARDEN_LOG_LEVEL: error
networks:
- ci
postgres:
image: postgres
environment:
POSTGRES_DB: concourse
POSTGRES_PASSWORD: concourse
POSTGRES_USER: concourse
PGDATA: /database
networks:
- ci
volumes:
- pgdata:/database
registry:
image: registry
networks:
ci:
ipv4_address: "172.21.0.254"
aliases: [ci-registry]
volumes:
- regdata:/var/lib/registry
networks:
ci:
ipam:
driver: default
config:
- subnet: 172.21.0.0/16
volumes:
pgdata:
regdata:
I’m using the latest concourse/concourse image, as I set this up today. The version of docker is 18.09.2 (build 62479626f213818ba5b4565105a05277308587d5). What can I look at to help debug this?
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (6 by maintainers)
Yeah that seems to work.
/mnt/concourse-workdir0Into configuration.nix, add:
CONCOURSE_WORK_DIRto/workdirWe are seeing this error very frequently in the Spring Boot builds. We are running v5.7.2 on
bosh-vsphere-esxi-ubuntu-xenial-go_agent 621.29stemcell, using theoverlaydriver.In
web.stdout.logwe have:In
worker.stdout.logwe see the baggageclaim error:The error arrives at the end of builds. The pipelines use https://concourse-ci.org/tasks.html#task-caches to cache dependencies between runs. https://github.com/spring-projects/spring-boot/blob/89237634c7931f275ddbddba176c7a826b1667cb/ci/tasks/build-project.yml#L7 When we query the
volumestable byhandle, we can confirm no record was created for999ba5a8-f8a1-4e5d-5087-c5e3974e15e1.We considered underlying server load, so enabled
container-placement-strategy-limit-active-tasks, which distibuted things nicely (thank you!). Now that load seems fine, it is mainly the Spring Boot pipelines that have this issue in our multi-tenant https://ci.spring.io.We can re-recreate all of the workers to make the issue go way for a few days, but it eventually comes back. We see a clear pattern of the error re-surfacing after a number of green builds. reported in #concourse-operations.
@barrucadu setting:
and adding a volume for the
/worker-statedirectory in my worker’s service configuration was necessary for baggageclaim to create volumes.