concourse: Seemingly random crash in findOrCreateResourceConfigScope

Bug Report

Periodically, concourse is crashing with the following panic (it takes down the entire server):

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x9751bc]

goroutine 19524 [running]:
github.com/concourse/concourse/atc/db.findOrCreateResourceConfigScope(0x275d280, 0xc00164a8d0, 0x2777fc0, 0xc000514a50, 0x26f9c40, 0xc00047b800, 0x274d260, 0xc000f3fc00, 0x2791ac0, 0xc0019bcb00, ...)
	/tmp/build/1c3187db/concourse/atc/db/resource_config.go:296 +0x1c1c
github.com/concourse/concourse/atc/db.(*build).SaveOutput(0xc000c0c1e0, 0xc00129ab10, 0xc, 0xc000ad7b60, 0xc001ba3680, 0x6, 0x8, 0xc001c652f0, 0xc00372a880, 0x2, ...)
	/tmp/build/1c3187db/concourse/atc/db/build.go:884 +0x32d
github.com/concourse/concourse/atc/engine/builder.(*putDelegate).SaveOutput(0xc000d4d310, 0x2763160, 0xc0019f0720, 0xc00129ab10, 0xc, 0xc00129ab08, 0x4, 0xc00129ab0c, 0x4, 0xc000fd6450, ...)
	/tmp/build/1c3187db/concourse/atc/engine/builder/delegate_factory.go:220 +0x35f
github.com/concourse/concourse/atc/exec.(*PutStep).Run(0xc0006d4600, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0x1, 0xc0012c4520)
	/tmp/build/1c3187db/concourse/atc/exec/put_step.go:199 +0xeb6
github.com/concourse/concourse/atc/exec.LogErrorStep.Run(0x270f6a0, 0xc0006d4600, 0x7f6e1975cdc8, 0xc000d4d310, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc0012c2ef0, 0x0)
	/tmp/build/1c3187db/concourse/atc/exec/log_error_step.go:30 +0xe4
github.com/concourse/concourse/atc/exec.OnSuccessStep.Run(0x2718020, 0xc0012befe0, 0x2718020, 0xc0012bf000, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0x0, 0x0)
	/tmp/build/1c3187db/concourse/atc/exec/on_success.go:29 +0x60
github.com/concourse/concourse/atc/exec.EnsureStep.Run(0x2718020, 0xc0012befc0, 0x2718120, 0xc0012bf020, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc000048028, 0x0)
	/tmp/build/1c3187db/concourse/atc/exec/ensure_step.go:44 +0xfa
github.com/concourse/concourse/atc/exec.InParallelStep.Run.func1(0xc0016e8c40, 0xc0030a9f80, 0x2717ee0, 0xc0012bf040, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc0012bf060, 0x2, ...)
	/tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:61 +0xb4
created by github.com/concourse/concourse/atc/exec.InParallelStep.Run
	/tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:56 +0x26e

Steps to Reproduce

It’s not yet clear what is causing this to occur

Expected Results

Ideally, concourse wouldn’t completely come down

Actual Results

The server comes down completely

Version Info

  • Concourse version: 5.6.0
  • Deployment type (BOSH/Docker/binary): Docker
  • Infrastructure/IaaS: Google Cloud
  • Browser (if applicable): N/A
  • Did this used to work? Yes

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 15 (13 by maintainers)

Most upvoted comments

We found a way to reproduce the bug described by this ticket in Concourse 5.6.0. Briefly:

  • Set a pipeline that contains a job and a resource that is updated by a put step in the job
  • Launch a build in the job
  • While the job is running but before the put step is run, change the type field of the resource_type in the pipeline configuration for the resource and set the new pipeline configuration with fly set-pipeline
  • Wait for the put step to be executed; when it is, you should observe the seg fault in findOrCreateResourceConfigScope

@cirocosta @xtremerui would a test case that reproduces the problem help with your investigations?

Hi @antonu17 ,

As there were changes in the database schema between those two versions, you’ll have to manually perform a downgrade - see https://concourse-ci.org/concourse-web.html#downgrading

ps.: you can check that there were changes by looking at atc/db/migrations:

$ git diff v5.4.0..v5.6.0 --stat | grep migrations
 .../migrations/1522178770_add_job_tags.up.go       |    4 +-
 .../migrations/1563997651_users_table.down.sql     |    3 +
 .../migrations/1563997651_users_table.up.sql       |   10 +
 .../migrations/1565800062_create_checks.down.sql   |    3 +
 .../migrations/1565800062_create_checks.up.sql     |   20 +

I see. Thank you! For some reasons I was sure concourse web is running necessary migrations upon launch for both upgrade and downgrade. 🤷‍♂