k3s: Unable to join additional server in Dual Stack
Environmental Info: K3s Version:
# k3s -v
k3s version v1.22.3+k3s1 (61a2aab2)
go version go1.16.8
Node(s) CPU architecture, OS, and Version:
# uname -a
Linux k1 5.15.2-arch1-1 #1 SMP PREEMPT Fri, 12 Nov 2021 19:22:10 +0000 x86_64 GNU/Linux
Cluster Configuration:
Starting with 2 servers + external postgres database.
Describe the bug:
When starging a new cluster with fresh database, the first server comes up without issue, but trying to join a second server looks like it has trouble reaching the metrics-server and then fails.
Steps To Reproduce:
- Installed K3s: On the first server.
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -s \
- server \
--datastore-endpoint="postgres://k3s:PW@db.example.com:5432/k3s" \
--node-ip="172.16.15.21,fc15::21" \
--cluster-cidr="10.42.0.0/16,fc15:1::/64" \
--service-cidr="10.43.0.0/16,fc15:2::/112" \
--disable-network-policy \
--tls-san=k.znet
Then I wait for all pods. Then on the second server.
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 K3S_TOKEN="TOKEN" sh -s \
- server \
--datastore-endpoint="postgres://k3s:PW@db.example.com:5432/k3s" \
--node-ip="172.16.15.22,fc15::22" \
--cluster-cidr="10.42.0.0/16,fc15:1::/64" \
--service-cidr="10.43.0.0/16,fc15:2::/112" \
--disable-network-policy \
--tls-san=k.znet
The install never completes on the second node, but journalctl tells us it is still trying. Some flavor of the following keeps looping.
Nov 18 10:08:55 k2 k3s[518]: E1118 10:08:55.369789 518 available_controller.go:524] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.170.124:443/apis/metrics.k8s.io/v1beta1: Get "https://10.43.170.124:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.381292 518 available_controller.go:524] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879126 518 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879209 518 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/svclb-traefik-vpwtd"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879244 518 kuberuntime_manager.go:818] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/svclb-traefik-vpwtd"
Nov 18 10:09:00 k2 k3s[518]: E1118 10:09:00.879337 518 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"svclb-traefik-vpwtd_kube-system(4e5f21a9-c545-4f71-81a3-16ba98d64d2d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"svclb-traefik-vpwtd_kube-system(4e5f21a9-c545-4f71-81a3-16ba98d64d2d)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"a93b592512d21f5ee58e6d36cc16a2596a78392a52fd72a88ca6c75e88e9a562\\\": open /run/flannel/subnet.env: no such file or directory\"" pod="kube-system/svclb-traefik-vpwtd" podUID=4e5f21a9-c545-4f71-81a3-16ba98d64d2d
Expected behavior: I’d expect the install to complete.
Actual behavior: Instllation of the second node never completes repeating the errors above. First node works fine and I can deploy pods to it with no issue.
Additional context / logs: See above.
Backporting
- Needs backporting to older releases
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (10 by maintainers)
@xaque208 Manuel is currently off having fun and enjoying time off, but I was looking at this issue with him before he took off and can provide some info around this.
I think a small note can be added for sure.
You should be able to set the
kube-controller-managers--node-cidr-mask-size-ipv6via using--kube-controller-manager-arg="node-cidr-mask-size-ipv6=64"onk3s.