concourse: Concourse cannot handle more than 2147483647 checks/builds
Summary
When build_id id in the builds table exceeds 2147483647, all future builds fail to create.
Errors in the logs like:
concourse-web-1 | {"timestamp":"2022-05-19T20:54:03.056255000Z","level":"error","source":"atc","message":"atc.tracker.tick.run.lock.acquire.failed-to-register-in-db","data":{"build":"26","build_id":2147483660,"error":"pq: value \"2147483660\" is out of range for type integer","id":[1,2147483660],"job":"job","pipeline":"example","session":"26.486.4.1.1","team":"main"}}
concourse-web-1 | {"timestamp":"2022-05-19T20:54:03.056311307Z","level":"error","source":"atc","message":"atc.tracker.tick.run.failed-to-get-lock","data":{"build":"26","build_id":2147483660,"error":"pq: value \"2147483660\" is out of range for type integer","job":"job","pipeline":"example","session":"26.486.4","team":"main"}}
concourse-db-1 | 2022-05-19 20:54:03.056 UTC [70] ERROR: value "2147483660" is out of range for type integer
concourse-db-1 | 2022-05-19 20:54:03.056 UTC [70] CONTEXT: unnamed portal parameter $2 = '...'
concourse-db-1 | 2022-05-19 20:54:03.056 UTC [70] STATEMENT: SELECT pg_try_advisory_lock($1,$2)
concourse-db-1 | 2022-05-19 20:54:03.056 UTC [70] ERROR: value "2147483656" is out of range for type integer
concourse-db-1 | 2022-05-19 20:54:03.056 UTC [70] CONTEXT: unnamed portal parameter $2 = '...'
Steps to reproduce
- Start up the dev env:
docker-compose up - Start the every 30s pipeline:
fly -t dev set-pipeline -p example -c examples/pipelines/time-triggered.yml - Unpause the pipeline
- Let it run for a few minutes, but may not be strictly necessary
- Log into the DB: ./hack/db
- Restart the builds_id_seq sequence to start at 2147483647.
concourse=# alter sequence builds_id_seq restart with 2147483647; ALTER SEQUENCE
Expected results
Builds continue to work
Actual results
Concourse system stops executing new builds
Additional context
- Unupdated schema columns We noticed there were a few build_id and related fields that were not updated in the transition to bigint:
table: build_comments
column: build_id
table: builds
column: rerun_of
table: successful_build_outputs
column: rerun_of
table: jobs
column: first_logged_build_id
There may be more.
- In our production Concourse cluster on which this originally happened, one of the errors we saw was:
LOG: execute <unnamed>: SELECT completed FROM builds WHERE id = $1
DETAIL: parameters: $1 = '2147493588'
ERROR: value "2147493588" is out of range for type integer
What’s weird is that builds and id are already bigint, including the sequence.
That error message, happens if Postgres is told to typecast from a string to an integer, as can be seen reproduced below:
testdb=# CREATE TABLE foobar (test_id bigint);
CREATE TABLE
testdb=# SELECT * FROM foobar WHERE test_id = 2147483648;
test_id
---------
(0 rows)
testdb=# SELECT * FROM foobar WHERE test_id = '2147483648'::integer;
ERROR: value "2147483648" is out of range for type integer
So likely the case, that this is somewhere in code a string is being passed to Postgres and told to typecast to an integer, before storing to a bigint field.
Triaging info
Concourse version: 7.4.4. Test was against master Concourse branch. Browser (if applicable): Did this used to work? Not likely
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 17 (9 by maintainers)
It looks like we’ve hit the limit of
2147483647as also mentioned in #6354 today, and are therefore pretty broken. Is a new public release imminent that will address this, or are there other mitigations that we can put in place now?For us, it seems to
buildstable’s index that seems to be in integer mode. We rebuilt the index and builds records can be saved and able to run the builds.atc=> \d builds_pkey; Index “public.builds_pkey” Column | Type | Key? | Definition --------±-------±-----±----------- id | bigint | yes | id primary key, btree, for table “public.builds”
<<< this fixed the mismatch b/n table col & sequence type: REINDEX INDEX builds_pkey;
I read it like
pg_try_advisory_lock(lockType, id % 2147483647).