k3s: [k3s-upgrade] k3s service failed to start after upgrade
Environmental Info: K3s Version:
k3s version v1.23.4+k3s1 (43b1cb48)
go version go1.17.5
Node(s) CPU architecture, OS, and Version:
5.4.0-1056-raspi #63-Ubuntu
aarch64 aarch64 aarch64 GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
Describe the bug: I tried to upgrade the k3s version of my cluster (master node and worker nodes) by following this : k3s-upgrade
Steps To Reproduce:
kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml
# master nodes
kubectl label node <node-name> k3s-master-upgrade=true
# worker nodes
kubectl label node <node-name> k3s-worker-upgrade=true
# apply upgrade plan
kubectl apply -f agent.yml
kubectl apply -f server.yml
my plans:
server.yml
# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: server-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: k3s-master-upgrade
operator: In
values:
- "true"
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.23.4+k3s1
agent.yml
# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: agent-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: k3s-worker-upgrade
operator: In
values:
- "true"
prepare:
args:
- prepare
- server-plan
image: rancher/k3s-upgrade
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.23.4+k3s1
Expected behavior:
All nodes to upgrade successfully to k3s version 1.23.4+k3s1
Actual behavior: master node k3s updated the k3s binary on the machine but failed to start the service
Additional context / logs:
Mar 28 09:25:54 huey sh[3502]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Mar 28 09:25:54 huey sh[3508]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Mar 28 09:25:55 huey k3s[799]: time="2022-03-28T09:25:55Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Starting k3s v1.23.4+k3s1 (43b1cb48)"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Database tables and indexes are up to date"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Kine available at unix://kine.sock"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=fatal msg="starting kubernetes: preparing server: failed to normalize token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASS>
Mar 28 09:25:56 huey systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (6 by maintainers)
I deleted the
k3s/server/tokenfile from the filesystem and restarted thek3ssystemd service. In my casek3swas able to restore the contents of that file.I’m having the same issue on a single node cluster. I noticed that
/var/lib/rancher/k3s/server/tokenhas recently been written and is now empty.What about multi-node clusters? I ran into this issue while trying to upgrade an agent node from 1.22.6+k3s1 to the latest. Can I just grab the token from another node and force inject it during the upgrade? The weirdest part is that it’s communicating with the cluster just fine.