concourse: 3.1.1 jobs fail with "file not found" after flushing workers

Bug Report

Running 3.1.1 we see jobs failing with “file not found” errors. We have flushed workers (destroy and recreate the pods in kubernetes) and the errors go away for about a day. They come back sporadically but eventually fail nearly every build.

Below is an example log entry we see with the error:

{"timestamp":"1496684115.561470270","source":"atc","message":"atc.syncer.build-develop-ucp-loyalty:radar.scanner-failed","log_level":2,"data":{"error":"file not found","member":"build-develop-ucp-loyalty:resource:pr-svc-transaction-history-service","session":"17.17"}}

  • Concourse version: 3.1.1
  • Deployment type (BOSH/Docker/binary): Docker image running on CoreOS Kubernetes in AWS
  • Infrastructure/IaaS: AWS
  • Browser (if applicable): Chrome
  • Did this used to work? Yes. Previously on 3.0.1 no issue

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 11
  • Comments: 24 (6 by maintainers)

Commits related to this issue

Most upvoted comments

help - I’m affected by the same problem (on docker)

My only workaround is to destroy the whole docker environment since trying to prune the worker leads to a forbidden error, even with a stalled worker

Faced the exact same issue, running this pipeline from the tutorial on the official vagrant box: pipeline.yml

---
jobs:
- name: job-hello-world
  public: true
  plan:
  - task: hello-world
    config:
      platform: linux
      image_resource:
        type: docker-image
        source: {repository: busybox}
      run:
        path: echo
        args: [hello world]
  1. Spin up the official vagrant vm
  2. Set the mentioned pipeline
  3. Check that job-hello-world finishes successfully when triggered manually
  4. Halt vagrant vm and spin it up again
  5. Check that job-hello-world exits with error “file not found”

Is anyone else running their workers in Docker? The issue that caused this for me was the change in 3.X. If I use docker-compose down, docker-compose pull, docker-compose up for my upgrade; all my workers are “stalled” and have to be pruned in order to create new ones before this issue goes away. It’s annoying because I have to manually prune workers every time I cycle them because of their IDs changing.

I’m seeing it on Ubuntu 16.04, seems straight forward to reproduce. Setup a docker-compose.yml file as per the documentation (I need a CONCOURSE_GARDEN_DNS_SERVER env. variable), deploy a build which will work, bounce the containers (docker-compose stop, docker-compose-start). Same build fails with ‘file not found’ error. I see it in 3.2.0 also

Same here. Binaries are installed directly on a ubuntu machine. Last time ‘file not found’ occured, I upgraded from version 3.2.1 to 3.3.1, rebooted and everything worked fine. Rebooting again and same error (file not found) occured. I then tried to re-install 3.3.1, rebooted and ‘file not found’ still showed up. After removing the pipeline and re-adding it, pipeline worked again. Paused pipelines do not seem to be affected, they worked just fine when unpausing.

I just got this on 4.1.0