k3s: Can't reboot/recreate master nodes due the existing passwd file
Environmental Info: K3s Version: 1.22.5+k3s-1
Node(s) CPU architecture, OS, and Version: 2 x amd64 Ubuntu 20.04.3
Kernel: 5.4.0-92-generic
Cluster Configuration: 2 master nodes with MySQL backend
Describe the bug: After upgrade the cluster from 1.21.4 to 1.21.8 and finally to 1.22.5 the cluster is healthy and running with both nodes. But after reboot a node, k3s won’t start anymore and I got the error:
Jan 11 11:18:33 raseed-test-server-1 k3s[13039]: time="2022-01-11T11:18:33Z" level=fatal msg="/var/lib/rancher/k3s/server/cred/passwd newer than datastore and could cause cluster outage. Remove the file from disk and restart to be recreated from datastore."
the file exists on the disc:
# ls -l /var/lib/rancher/k3s/server/cred/passwd
-rw------- 1 root root 111 Jan 10 13:59 /var/lib/rancher/k3s/server/cred/passwd
For sure, I can remove the file and k3s will start, but it seems not very handy in things of automatation.
How can I prevend the error and what are the corresponding database entry which is compared in that function?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 41 (20 by maintainers)
I did the same (deleted bootstrap data) on my sqlite database and everything seems to work now even after 1.21.9 upgrade
just to summarize:
starting kubernetes: preparing server: bootstrap data already found and encrypted with different tokenSo, it seems we need functionality to update Bootstrap data in DB with the newer token, as an emergency tool or regular operations.
Almost the same unfortunately isn’t good enough. The token value cannot currently be changed once the cluster is created - it must be set initially and remain set to the same value across all servers in the cluster.
If a token is not specified in the configuration, a token is randomly generated and written to disk. I think that what probably happened is that the token was changed at some point, but until we added bootstrap data reconciliation, it didn’t matter since the data was already on disk. Now that we attempt to reconcile, the change is causing problems.
You may be right; resolving this may require creating a tool, or adding functionality to K3s itself, to allow authoritatively setting the token and triggering a reset of the bootstrap data using the current on-disk state of a server.
Hmm, are you sure they’re exactly the same? The file timestamps are only checked if the content is different, so if they are in fact the same it shouldn’t care what the timestamp is.
https://github.com/k3s-io/k3s/blob/f662a7f45b95a2bbf2d55af63761e62d4e2731c0/pkg/cluster/bootstrap.go#L470-L477
Yeah, that indicates that the data in the database hasn’t been migrated to the newer format that includes the timestamp yet. Can you confirm that the contents of
/var/lib/rancher/k3s/server/cred/passwdon both nodes is the same, and matches the contents of thePasswdFileentry in your datastore?Is there anything unique to your environment that would cause this file to be modified? This is raising an error because it’s not expected that it would be out of sync after a restart.
cc @briandowns