concourse: Lots of `unknown handle` errors

Since upgrading Concourse to 3.0.1 & now 3.1.1, we started seeing many unknown handle errors, across all pipelines, in both AWS & vSphere:

image

We have re-created the workers, the errors are not going away. Is this a known bug? What else can we try?

  • Concourse version: 3.1.1
  • Deployment type: BOSH
  • Infrastructure/IaaS: same issues across 2 independent Concourse deployments, vSphere & AWS
  • Did this used to work? Yes

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 6
  • Comments: 37 (14 by maintainers)

Most upvoted comments

I think this issue needs to be re-opened. I’m on concourse 3.4.1 running on k8s via helm chart. Regardless of how a worker process ended (SIGKILL, SIGTERM, land-worker, retire-worker), when it starts up again it always cleans up its container volumes. TSA (?) sees the worker come back under the same name and expects volumes to be there but they aren’t. I see this in the logs at startup

{"timestamp":"1505767465.598683357","source":"guardian","message":"guardian.start.clean-up-container.start","log_level":1,"data":{"handle":"5a86c759-daff-4dee-44cc-a87ed154533b","session":"6.10"}}
{"timestamp":"1505767465.598800182","source":"guardian","message":"guardian.start.clean-up-container.destroy.started","log_level":1,"data":{"handle":"5a86c759-daff-4dee-44cc-a87ed154533b","session":"6.10.1"}}
{"timestamp":"1505767465.604939699","source":"guardian","message":"guardian.start.clean-up-container.destroy.state","log_level":1,"data":{"handle":"5a86c759-daff-4dee-44cc-a87ed154533b","session":"6.10.1","state":{"Pid":123,"Status":"created"}}}

I have a PR to the chart that works around this issue: https://github.com/kubernetes/charts/pull/2109