concourse: Concourse cannot handle more than 2147483647 checks/builds

Summary

When build_id id in the builds table exceeds 2147483647, all future builds fail to create.

Errors in the logs like:

concourse-web-1     | {"timestamp":"2022-05-19T20:54:03.056255000Z","level":"error","source":"atc","message":"atc.tracker.tick.run.lock.acquire.failed-to-register-in-db","data":{"build":"26","build_id":2147483660,"error":"pq: value \"2147483660\" is out of range for type integer","id":[1,2147483660],"job":"job","pipeline":"example","session":"26.486.4.1.1","team":"main"}}
concourse-web-1     | {"timestamp":"2022-05-19T20:54:03.056311307Z","level":"error","source":"atc","message":"atc.tracker.tick.run.failed-to-get-lock","data":{"build":"26","build_id":2147483660,"error":"pq: value \"2147483660\" is out of range for type integer","job":"job","pipeline":"example","session":"26.486.4","team":"main"}}
concourse-db-1      | 2022-05-19 20:54:03.056 UTC [70] ERROR:  value "2147483660" is out of range for type integer
concourse-db-1      | 2022-05-19 20:54:03.056 UTC [70] CONTEXT:  unnamed portal parameter $2 = '...'
concourse-db-1      | 2022-05-19 20:54:03.056 UTC [70] STATEMENT:  SELECT pg_try_advisory_lock($1,$2)
concourse-db-1      | 2022-05-19 20:54:03.056 UTC [70] ERROR:  value "2147483656" is out of range for type integer
concourse-db-1      | 2022-05-19 20:54:03.056 UTC [70] CONTEXT:  unnamed portal parameter $2 = '...'

Steps to reproduce

  1. Start up the dev env: docker-compose up
  2. Start the every 30s pipeline: fly -t dev set-pipeline -p example -c examples/pipelines/time-triggered.yml
  3. Unpause the pipeline
  4. Let it run for a few minutes, but may not be strictly necessary
  5. Log into the DB: ./hack/db
  6. Restart the builds_id_seq sequence to start at 2147483647.
concourse=# alter sequence builds_id_seq restart with 2147483647;
ALTER SEQUENCE

Expected results

Builds continue to work

Actual results

Concourse system stops executing new builds

Additional context

  1. Unupdated schema columns We noticed there were a few build_id and related fields that were not updated in the transition to bigint:

https://github.com/concourse/concourse/blob/8c8eb0565101b287107fe9d36ef970505ad166b7/atc/db/migration/migrations/1603401316_alter_build_integers_to_bigints.up.sql

table: build_comments
column: build_id

table: builds
column: rerun_of

table: successful_build_outputs
column: rerun_of

table: jobs
column: first_logged_build_id

There may be more.

  1. In our production Concourse cluster on which this originally happened, one of the errors we saw was:
LOG: execute <unnamed>: SELECT completed FROM builds WHERE id = $1
DETAIL: parameters: $1 = '2147493588'
ERROR: value "2147493588" is out of range for type integer

What’s weird is that builds and id are already bigint, including the sequence.

That error message, happens if Postgres is told to typecast from a string to an integer, as can be seen reproduced below:

testdb=# CREATE TABLE foobar (test_id bigint);
CREATE TABLE
 
testdb=# SELECT * FROM foobar WHERE test_id = 2147483648;
test_id
---------
(0 rows)
 
testdb=# SELECT * FROM foobar WHERE test_id = '2147483648'::integer;
ERROR: value "2147483648" is out of range for type integer

So likely the case, that this is somewhere in code a string is being passed to Postgres and told to typecast to an integer, before storing to a bigint field.

Triaging info

Concourse version: 7.4.4. Test was against master Concourse branch. Browser (if applicable): Did this used to work? Not likely

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (9 by maintainers)

Most upvoted comments

It looks like we’ve hit the limit of 2147483647 as also mentioned in #6354 today, and are therefore pretty broken. Is a new public release imminent that will address this, or are there other mitigations that we can put in place now?

For us, it seems to builds table’s index that seems to be in integer mode. We rebuilt the index and builds records can be saved and able to run the builds.

atc=> \d builds_pkey; Index “public.builds_pkey” Column | Type | Key? | Definition --------±-------±-----±----------- id | bigint | yes | id primary key, btree, for table “public.builds”

<<< this fixed the mismatch b/n table col & sequence type: REINDEX INDEX builds_pkey;

I read it like pg_try_advisory_lock(lockType, id % 2147483647).