concourse: Resources failing in Kubernetes with Google Container-Optimized OS after upgrade to 3.1.0

Bug Report

  • Concourse version: 3.1.0
  • Deployment type (Docker):
  • Infrastructure/IaaS: Kubernetes

After upgrade to 3.1.0 all git and time resources are failing checks with:

runc create: exit status 1: container_linux.go:264: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:57: mounting \\\"/worker-state/3.1.0/assets/bin/init\\\" to rootfs \\\"/worker-state/volumes/live/26e7c69d-69fc-4f0f-507d-4b30c461a78f/volume\\\" at \\\"/worker-state/volumes/live/26e7c69d-69fc-4f0f-507d-4b30c461a78f/volume/tmp/garden-init\\\" caused \\\"open /worker-state/volumes/live/26e7c69d-69fc-4f0f-507d-4b30c461a78f/volume/tmp/garden-init: permission denied\\\"\""

Other resources seems to check fine

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 19
  • Comments: 70 (27 by maintainers)

Commits related to this issue

Most upvoted comments

This is still an issue for me with 3.2.1 - any plans to fix this?

I tinkered with the configuration of the worker a bit and found out that the issue disappears when I switch the worker to “naive” baggageclaim driver (start the worker with --baggageclaim-driver=naive command line flag).

I was able to avoid it if I started the worker with --baggageclaim-driver=naive as environment variables.

# spec.template.spec.containers[].env:
            - name: CONCOURSE_BAGGAGECLAIM_DRIVER
              value: "naive"

Kubernetes 1.6.4(on GKE) Concourse 3.3.0(from Helm concourse-0.1.3)

I can confirm my Concourse 3.3.0 deployment to GKE k8s 1.6.4 cluster, workers running: linux kernel 4.4.35+ has the issue.

From kubectl get nodes

System Info:
 Kernel Version:                4.4.35+
 OS Image:                      Container-Optimized OS from Google
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.11.2
 Kubelet Version:               v1.6.4
 Kube-Proxy Version:            v1.6.4

It seems to be related with some kernel param

I can reproduce it with kernels 4.4.35+ (Container-Optimized OS -google cloud-) 4.4.65-k8s (debian kubernetes kops)

It works fine in: 4.4.0 (ubuntu xenial) 4.9.24 coreos

Pipeline to test

---
jobs:
  - name: test-overlay
    plan:
      - get: time
      - task: test-task
        config:
          platform: linux
          image_resource:
            type: docker-image
            source:
              repository: busybox
          run:
            path: sh
            args:
              - -c
              - |
                echo hello world

resources:
  - name: time
    type: time
    source:
      interval: 1m

Unfortunately we too experience the issue with Concourse 3.1.1 on AWS (running on Kubernetes using the helm chart). OS: Debian Jessie. Baggageclaim driver: overlay.

The problem can be reproduced with help of the following pipeline:

jobs:
  - name: test
    plan:
      - task: test
        config:
          platform: linux
          image_resource:
            type: docker-image
            source:
              repository: alpine
          run:
            path: ls
            args: ["-la", "."]

Concourse successfully pulls the docker image, but stumbles on running the task in non-privileged container:

{"timestamp":"1497359514.769297361","source":"guardian","message":"guardian.api.garden-server.create.creating","log_level":0,"data":{"request":{"Handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/573b155c-76ab-4ccf-4f59-5ccad44a2b30/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":false,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.216"}}
{"timestamp":"1497359514.769345284","source":"guardian","message":"guardian.create.start","log_level":1,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189"}}
{"timestamp":"1497359514.769390821","source":"guardian","message":"guardian.create.containerizer-create.start","log_level":1,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2"}}
{"timestamp":"1497359514.786541939","source":"guardian","message":"guardian.create.containerizer-create.depot-create.started","log_level":1,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2.1"}}
{"timestamp":"1497359514.787511349","source":"guardian","message":"guardian.create.containerizer-create.depot-create.finished","log_level":1,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2.1"}}
{"timestamp":"1497359514.787638426","source":"guardian","message":"guardian.create.containerizer-create.lookup.started","log_level":0,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2.2"}}
{"timestamp":"1497359514.787839890","source":"guardian","message":"guardian.create.containerizer-create.lookup.finished","log_level":0,"data":{"handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2.2"}}
{"timestamp":"1497359514.787889004","source":"guardian","message":"guardian.create.containerizer-create.create.creating","log_level":1,"data":{"bundle":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","bundlePath":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","id":"c2888395-d0bb-4e78-5da2-265b128f72c3","logPath":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3/create.log","pidFilePath":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3/pidfile","runc":"/concourse-work-dir/3.1.1/assets/bin/runc","session":"189.2.3"}}
{"timestamp":"1497359514.888193846","source":"guardian","message":"guardian.create.containerizer-create.create.runc","log_level":0,"data":{"bundle":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","message":"exit status 1","session":"189.2.3"}}
{"timestamp":"1497359514.888237238","source":"guardian","message":"guardian.create.containerizer-create.create.runc","log_level":0,"data":{"bundle":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","message":"container_linux.go:264: starting container process caused \"process_linux.go:339: container init caused \\\"rootfs_linux.go:57: mounting \\\\\\\"/concourse-work-dir/3.1.1/assets/bin/init\\\\\\\" to rootfs \\\\\\\"/concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume\\\\\\\" at \\\\\\\"/concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume/tmp/garden-init\\\\\\\" caused \\\\\\\"open /concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume/tmp/garden-init: permission denied\\\\\\\"\\\"\"\n","session":"189.2.3"}}
{"timestamp":"1497359514.888260841","source":"guardian","message":"guardian.create.containerizer-create.create.runc","log_level":0,"data":{"bundle":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","message":"container_linux.go:264: starting container process caused \"process_linux.go:339: container init caused \\\"rootfs_linux.go:57: mounting \\\\\\\"/concourse-work-dir/3.1.1/assets/bin/init\\\\\\\" to rootfs \\\\\\\"/concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume\\\\\\\" at \\\\\\\"/concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume/tmp/garden-init\\\\\\\" caused \\\\\\\"open /concourse-work-dir/volumes/live/9481101c-fc45-475f-453e-072880b49f12/volume/tmp/garden-init: permission denied\\\\\\\"\\\"\"\n","session":"189.2.3"}}
{"timestamp":"1497359514.888289213","source":"guardian","message":"guardian.create.containerizer-create.create.finished","log_level":1,"data":{"bundle":"/concourse-work-dir/depot/c2888395-d0bb-4e78-5da2-265b128f72c3","handle":"c2888395-d0bb-4e78-5da2-265b128f72c3","session":"189.2.3"}}

If I set the task’s privileged attribute to “true” it starts working.

I tinkered with the configuration of the worker a bit and found out that the issue disappears when I switch the worker to “naive” baggageclaim driver (start the worker with --baggageclaim-driver=naive command line flag). I presume the issue has something to do with running non-privileged containers using runc from a volume backed by the overlay fs driver.

Thanks for the information everyone, we’ve confirmed that this is a support issue with Concourse v3.1.0+ running on Google’s Container-Optimized OS with Kernel version 4.4.35+.

Reproduced this using GCE cluster and the latest concourse/concourse Docker image. Digging into whether this is a kernel specific issue, or something complicated by Docker’s filesystem mounts. Running workers across all the distros! Stay tuned!

I can also confirm I’m getting similar error on 3.3.0 deployed to k8s 1.6.4 on GKE via the helm chart:

Example output from a git resource failure

runc create: exit status 1: container_linux.go:264: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:57: mounting \\\"/concourse-work-dir/3.3.0/assets/bin/init\\\" to rootfs \\\"/concourse-work-dir/volumes/live/724a870f-a34b-4cd2-5509-125b652e0a77/volume\\\" at \\\"/concourse-work-dir/volumes/live/724a870f-a34b-4cd2-5509-125b652e0a77/volume/tmp/garden-init\\\" caused \\\"open /concourse-work-dir/volumes/live/724a870f-a34b-4cd2-5509-125b652e0a77/volume/tmp/garden-init: permission denied\\\"\""

I’m seeing a similar error for a docker build step after upgrading to 3.1.1:

mount: permission denied (are you root?)

Is this the same issue or a new one? I’m using the pinned v1.12.6 version for the ECR auth workaround:

- name: docker-image-2
  type: docker-image
  privileged: true
  source:
    repository: concourse/docker-image-resource
    tag: docker-1.12.6

I’ve recreated workers.

@vito Recreate workers totally from scratch (volumes and all stuff) didn’t help. We finally downgrade to 3.0.1

I had a similar problem with k8s on GKE 1.8.7 deploying Concourse using Helm (also, using cos).

I tried to use the version 3.9.0 of Concourse with btrfs without success. The deploy worked, but when I was trying to execute a build, it was showing “No workers” in red. After deleting the Helm installation (with helm del --purge concourse) and reinstalling with naive option, it worked.

imageTag: "3.9.0"
concourse:
  baggageclaimDriver: naive

@viglesiasce I’m confused by that because isn’t the file command in the container that the worker is running in not the host? I think the actual problem is that the COS image kernel doesn’t have BTRFS.

Edit: The PR I have on the helm chart works on kops 1.6.2 (k8s 1.6.7)