k3s: Server doesn't start after upgrade to v1.21.7+k3s1 (ac705709): bootstrap data already found and encrypted with different token
Environmental Info: K3s Version: v1.21.7+k3s1 (ac705709)
Node(s) CPU architecture, OS, and Version:
5.15.4-201.fc35.x86_64 #1 SMP Tue Nov 23 18:54:50 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Fedora Linux 35
Cluster Configuration:
I have 2 servers using external PostgreSQL database and couple of agents.
There’s system-upgrade-controller configured to follow channel: https://update.k3s.io/v1-release/channels/stable
Describe the bug:
Today auto upgrade from v1.21.5+k3s2 to v1.21.7+k3s1 happened on on server. It stopped responding. Service logs shows following failure:
k3s[1644879]: time="2021-12-04T11:17:49.379252883+01:00" level=info msg="Starting k3s v1.21.7+k3s1 (ac705709)"
k3s[1644879]: time="2021-12-04T11:17:49.420438206+01:00" level=info msg="Configuring postgres database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
k3s[1644879]: time="2021-12-04T11:17:49.420639983+01:00" level=info msg="Configuring database table schema and indexes, this may take a moment..."
k3s[1644879]: time="2021-12-04T11:17:49.423536941+01:00" level=info msg="Database tables and indexes are up to date"
k3s[1644879]: time="2021-12-04T11:17:49.448156983+01:00" level=info msg="Kine listening on unix://kine.sock"
k3s[1644879]: time="2021-12-04T11:17:49.480807423+01:00" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"
systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
I have verified /var/lib/rancher/k3s/server/token is the same on both servers.
After downgrading k3s binary back to v1.21.5+k3s2, server is able to start again.
Steps To Reproduce:
- Installed K3s:
Cluster installed 342 days ago using:
export INSTALL_K3S_COMMIT=fadc5a8057c244df11757cd47cc50cc4a4cf5887
export K3S_DATASTORE_ENDPOINT=‘postgres://[…]’
./k3s-install server
–tls-san api.chi.pipebreaker.pl
–disable traefik
–flannel-backend=wireguard
Then it was auto-upgraded using system-upgrade-controller, stable channel, up to v1.21.5+k3s2.
Expected behavior:
Server should start after upgrade to v1.21.7+k3s1
Actual behavior:
Startup failed with level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"
Does handling the bootstrap data encryption changed between 1.21.5+k3s2 and 1.21.7+k3s1?
Additional context / logs:
There are corresponding logs from Postgresql:
2021-12-04 11:05:52.461 CET [2243563] ERROR: relation "key_value" does not exist at character 22
2021-12-04 11:05:52.461 CET [2243563] STATEMENT: SELECT COUNT(*) FROM key_value
2021-12-04 11:05:52.511 CET [2243567] LOG: could not receive data from client: Connection reset by peer
2021-12-04 11:05:52.511 CET [2243566] LOG: could not send data to client: Connection reset by peer
2021-12-04 11:05:52.511 CET [2243566] STATEMENT:
SELECT (
SELECT MAX(rkv.id) AS id
FROM kine AS rkv), (
SELECT MAX(crkv.prev_revision) AS prev_revision
FROM kine AS crkv
WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
FROM kine AS kv
WHERE
kv.name LIKE $1 AND
kv.id > $2
ORDER BY kv.id ASC LIMIT 500
2021-12-04 11:05:52.511 CET [2243565] LOG: could not send data to client: Connection reset by peer
2021-12-04 11:05:52.511 CET [2243565] STATEMENT:
SELECT (
SELECT MAX(rkv.id) AS id
FROM kine AS rkv), (
SELECT MAX(crkv.prev_revision) AS prev_revision
FROM kine AS crkv
WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
FROM kine AS kv
JOIN (
SELECT MAX(mkv.id) AS id
FROM kine AS mkv
WHERE
mkv.name LIKE $1
GROUP BY mkv.name) maxkv
ON maxkv.id = kv.id
WHERE
(kv.deleted = 0 OR $2)
ORDER BY kv.id ASC
LIMIT 1000
2021-12-04 11:05:52.511 CET [2243566] FATAL: connection to client lost
Backporting
- Needs backporting to older releases
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (16 by maintainers)
We haven’t changed the token hash calculation since
v0.11.0-alpha3.You mentioned above that you did not specify the token when initially creating the cluster; this means that each node had a different value in
/var/lib/rancher/k3s/server/tokenwhen the cluster was upgraded; I believe you need to take manual action to find the node that was upgraded first (the one whose token hash matches the bootstrap hash) and set that same token value on the other nodes.