k3s: [k3s-upgrade] k3s service failed to start after upgrade

Environmental Info: K3s Version:

k3s version v1.23.4+k3s1 (43b1cb48)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

5.4.0-1056-raspi #63-Ubuntu
aarch64 aarch64 aarch64 GNU/Linux

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Describe the bug: I tried to upgrade the k3s version of my cluster (master node and worker nodes) by following this : k3s-upgrade

Steps To Reproduce:

kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml

# master nodes
kubectl label node <node-name> k3s-master-upgrade=true
# worker nodes
kubectl label node <node-name> k3s-worker-upgrade=true

# apply upgrade plan
kubectl apply -f agent.yml
kubectl apply -f server.yml

my plans: server.yml

# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: k3s-master-upgrade
      operator: In
      values:
      - "true"
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.23.4+k3s1

agent.yml

# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: k3s-worker-upgrade
      operator: In
      values:
      - "true"
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/k3s-upgrade
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.23.4+k3s1

Expected behavior: All nodes to upgrade successfully to k3s version 1.23.4+k3s1

Actual behavior: master node k3s updated the k3s binary on the machine but failed to start the service

Additional context / logs:

Mar 28 09:25:54 huey sh[3502]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Mar 28 09:25:54 huey sh[3508]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Mar 28 09:25:55 huey k3s[799]: time="2022-03-28T09:25:55Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Starting k3s v1.23.4+k3s1 (43b1cb48)"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Database tables and indexes are up to date"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=info msg="Kine available at unix://kine.sock"
Mar 28 09:25:56 huey k3s[3517]: time="2022-03-28T09:25:56Z" level=fatal msg="starting kubernetes: preparing server: failed to normalize token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASS>
Mar 28 09:25:56 huey systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

I deleted the k3s/server/token file from the filesystem and restarted the k3s systemd service. In my case k3s was able to restore the contents of that file.

I’m having the same issue on a single node cluster. I noticed that /var/lib/rancher/k3s/server/token has recently been written and is now empty.

What about multi-node clusters? I ran into this issue while trying to upgrade an agent node from 1.22.6+k3s1 to the latest. Can I just grab the token from another node and force inject it during the upgrade? The weirdest part is that it’s communicating with the cluster just fine.