k3s: 2nd server fails to join existing cluster - starting kubernetes: preparing server: bootstrap data already found and encrypted with different token

Environmental Info: K3s Version: k3s version v1.21.3+k3s1 (1d1f220f) go version go1.16.6

Node(s) CPU architecture, OS, and Version: Linux pchost0 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 master, 5 agents

Describe the bug:

Adding second server to existing cluster, it fails to start the k3s service with the following error:

level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

Steps To Reproduce:

Installed K3s:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC=‘–write-kubeconfig-mode=644’ sh -s - server --datastore-endpoint=“mysql://user:password@tcp(xxx.home:3312)/kubernetes” --node-taint CriticalAddonsOnly=true:NoExecute --tls-san cluster.home

Expected behavior:

Server should join the cluster and the k3s service should start

Actual behavior:

k3s service fails to start.

Additional context / logs:

This is a fresh install of Ubuntu 20. No existing installations attempted before running into this error. Existing cluster is working fine. All nodes joined. Upgraded a node after this error fine. Upgraded existing server fine after this also without issue. Load balancer working fine with dns cluster.home with the existing server as the only member.

Aug 19 08:05:12 pchost0 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 48.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Automatic restarting of the unit k3s.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Aug 19 08:05:12 pchost0 systemd[1]: Stopped Lightweight Kubernetes.
-- Subject: A stop job for unit k3s.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit k3s.service has finished.
--
-- The job identifier is 70877 and the job result is done.
Aug 19 08:05:12 pchost0 systemd[1]: Starting Lightweight Kubernetes...
-- Subject: A start job for unit k3s.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 70877.
Aug 19 08:05:12 pchost0 sh[35423]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Aug 19 08:05:12 pchost0 sh[35429]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.449738243Z" level=info msg="Starting k3s v1.21.3+k3s1 (1d1f220f)"
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.457797198Z" level=info msg="Configuring mysql database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.457861474Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.461610412Z" level=info msg="Database tables and indexes are up to date"
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.467076907Z" level=info msg="Kine listening on unix://kine.sock"
Aug 19 08:05:12 pchost0 k3s[35439]: time="2021-08-19T08:05:12.492501196Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

Backporting

Needs backporting to older releases

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 23 (6 by maintainers)

Most upvoted comments

You need to provide the same --token to both servers when joining the cluster. If you didn’t specify the token when starting the first server, you can get it off the disk on that node, and provide it to the second node.

brandond on Aug 19, 2021

We put a warning about this in the SA but I’m concerned folks didn’t see it 😕

If servers are in an auto-scaling group, ensure that the server image is to include the token value before upgrading. If existing nodes are upgraded and then subsequently deleted prior to an administrator retrieving the randomly-generated token, there will be no nodes left from which to recover the token.

If you’d set the token from the get-go you would have been fine, this only affects folks who let the first server auto-generate a token. We should have been enforcing use of a token from the start.

brandond on Sep 2, 2021

@rlabrecque grab the token off the first server you upgraded, and add it as --token=<TOKEN> to the INSTALL_K3S_EXEC string.

The logic behind this change is explained in the advisory: https://github.com/k3s-io/k3s/security/advisories/GHSA-cxm9-4m6p-24mc - essentially, all your servers were previously using an empty string as the datastore encryption token; now they properly use the first server’s token - which means you need to provide the token when adding new servers.

brandond on Sep 1, 2021