concourse: 3.1.1 jobs fail with "file not found" after flushing workers
Bug Report
Running 3.1.1
we see jobs failing with “file not found” errors. We have flushed workers (destroy and recreate the pods in kubernetes) and the errors go away for about a day. They come back sporadically but eventually fail nearly every build.
Below is an example log entry we see with the error:
{"timestamp":"1496684115.561470270","source":"atc","message":"atc.syncer.build-develop-ucp-loyalty:radar.scanner-failed","log_level":2,"data":{"error":"file not found","member":"build-develop-ucp-loyalty:resource:pr-svc-transaction-history-service","session":"17.17"}}
- Concourse version: 3.1.1
- Deployment type (BOSH/Docker/binary): Docker image running on CoreOS Kubernetes in AWS
- Infrastructure/IaaS: AWS
- Browser (if applicable): Chrome
- Did this used to work? Yes. Previously on 3.0.1 no issue
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 11
- Comments: 24 (6 by maintainers)
help - I’m affected by the same problem (on docker)
My only workaround is to destroy the whole docker environment since trying to prune the worker leads to a
forbidden
error, even with a stalled workerFaced the exact same issue, running this pipeline from the tutorial on the official vagrant box: pipeline.yml
Is anyone else running their workers in Docker? The issue that caused this for me was the change in 3.X. If I use docker-compose down, docker-compose pull, docker-compose up for my upgrade; all my workers are “stalled” and have to be pruned in order to create new ones before this issue goes away. It’s annoying because I have to manually prune workers every time I cycle them because of their IDs changing.
I’m seeing it on Ubuntu 16.04, seems straight forward to reproduce. Setup a docker-compose.yml file as per the documentation (I need a CONCOURSE_GARDEN_DNS_SERVER env. variable), deploy a build which will work, bounce the containers (docker-compose stop, docker-compose-start). Same build fails with ‘file not found’ error. I see it in 3.2.0 also
Same here. Binaries are installed directly on a ubuntu machine. Last time ‘file not found’ occured, I upgraded from version 3.2.1 to 3.3.1, rebooted and everything worked fine. Rebooting again and same error (file not found) occured. I then tried to re-install 3.3.1, rebooted and ‘file not found’ still showed up. After removing the pipeline and re-adding it, pipeline worked again. Paused pipelines do not seem to be affected, they worked just fine when unpausing.
I just got this on 4.1.0