k3s: Unable to join additional server in Dual Stack

Environmental Info: K3s Version:

# k3s -v
k3s version v1.22.3+k3s1 (61a2aab2)
go version go1.16.8

Node(s) CPU architecture, OS, and Version:

# uname -a
Linux k1 5.15.2-arch1-1 #1 SMP PREEMPT Fri, 12 Nov 2021 19:22:10 +0000 x86_64 GNU/Linux

Cluster Configuration:

Starting with 2 servers + external postgres database.

Describe the bug:

When starging a new cluster with fresh database, the first server comes up without issue, but trying to join a second server looks like it has trouble reaching the metrics-server and then fails.

Steps To Reproduce:

Installed K3s: On the first server.

curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -s \
- server \
--datastore-endpoint="postgres://k3s:PW@db.example.com:5432/k3s" \
--node-ip="172.16.15.21,fc15::21" \
--cluster-cidr="10.42.0.0/16,fc15:1::/64" \
--service-cidr="10.43.0.0/16,fc15:2::/112" \
--disable-network-policy \
--tls-san=k.znet

Then I wait for all pods. Then on the second server.

curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 K3S_TOKEN="TOKEN" sh -s \
- server \
--datastore-endpoint="postgres://k3s:PW@db.example.com:5432/k3s" \
--node-ip="172.16.15.22,fc15::22" \
--cluster-cidr="10.42.0.0/16,fc15:1::/64" \
--service-cidr="10.43.0.0/16,fc15:2::/112" \
--disable-network-policy \
--tls-san=k.znet

The install never completes on the second node, but journalctl tells us it is still trying. Some flavor of the following keeps looping.

Nov 18 10:08:55 k2 k3s[518]: E1118 10:08:55.369789     518 available_controller.go:524] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.170.124:443/apis/metrics.k8s.io/v1beta1: Get "https://10.43.170.124:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.381292     518 available_controller.go:524] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879126     518 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879209     518 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/svclb-traefik-vpwtd"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879244     518 kuberuntime_manager.go:818] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/svclb-traefik-vpwtd"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879337     518 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"svclb-traefik-vpwtd_kube-system(4e5f21a9-c545-4f71-81a3-16ba98d64d2d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"svclb-traefik-vpwtd_kube-system(4e5f21a9-c545-4f71-81a3-16ba98d64d2d)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\\\": open /run/flannel/subnet.env: no such file or directory\"" pod="kube-system/svclb-traefik-vpwtd" podUID=4e5f21a9-c545-4f71-81a3-16ba98d64d2d

Expected behavior: I’d expect the install to complete.

Actual behavior: Instllation of the second node never completes repeating the errors above. First node works fine and I can deploy pods to it with no issue.

Additional context / logs: See above.

Backporting

Needs backporting to older releases

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 20 (10 by maintainers)

Most upvoted comments

@xaque208 Manuel is currently off having fun and enjoying time off, but I was looking at this issue with him before he took off and can provide some info around this.

I think a small note can be added for sure.

You should be able to set the kube-controller-managers --node-cidr-mask-size-ipv6 via using --kube-controller-manager-arg="node-cidr-mask-size-ipv6=64" on k3s.

Oats87 on Nov 27, 2021