hcloud-cloud-controller-manager: Cloud-Controller w/network (native routing) does not create correct routes
Hello,
I’ve been playing around with kubernetes 1.19 on hcloud for a bit now. Since the documentation about this is pretty old, I’ve been mostly trying to figure it on my own.
So my current setup: 1x Network / 10.0.0.0/8 1x LB (for a later HA setup of the control-planes, 10.0.0.5 here) 1x CPX11 (control-plane) 2x CPX11 (worker nodes)
Using kubeadm to setup the kubernetes cluster:
kubeadm init --ignore-preflight-errors=NumCPU --apiserver-cert-extra-sans $API_SERVER_CERT_EXTRA_SANS --control-plane-endpoint "$CONTROL_PLANE_LB" \
--upload-certs --kubernetes-version=$KUBE_VERSION --pod-network-cidr=$POD_NETWORK_CIDR
with the following variables: API_SERVER_CERT_EXTRA_SANS=10.0.0.1 CONTROL_PLANE_LB=10.0.0.5 KUBE_VERSION=v1.19.0 POD_NETWORK_CIDR=10.224.0.0/16
After that I copy the kube config and create the secrets for the hetzner ccm like this:
apiVersion: v1
kind: Secret
metadata:
name: hcloud
namespace: kube-system
stringData:
token: "<hetzner_api_token>"
network: "<hetzner_network_id>"
---
apiVersion: v1
kind: Secret
metadata:
name: hcloud-csi
namespace: kube-system
stringData:
token: "<hetzner_api_token>"
Followed by that I deploy the CCM-network:
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/ccm-networks.yaml
The cloud controller goes ready, the nodes do have the hcloud://serverid in their describe.
Now I deploy the latest cilium with a few tweaked parameters:
wget https://raw.githubusercontent.com/cilium/cilium/1.9.0/install/kubernetes/quick-install.yaml
Edit the quick-install.yml and ensure the following parameters:
tunnel: disabled
masquerade: "true"
enable-endpoint-routes: "true"
native-routing-cidr: "10.0.0.0/8"
cluster-pool-ipv4-cidr: "10.224.0.0/16"
Apply the deployment file.
Now the CNI is installed, coredns should start scheduling and the CCM creates routes for the nodes. So far so good, yet the created routes seem to be wrong for me.

Seeing here:
10.224.0.0/24 routes to 10.0.0.2 (master-01)
10.224.1.0/24 routes to 10.0.0.3 (worker-01)
10.224.2.0/24 routes to 10.0.0.4 (worker-02)
Yet the kubectl get pods -A -owide shows different ip distribution:
root@test-cluster-master-01:~# k get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system cilium-dkwbl 1/1 Running 0 46m 10.0.0.3 test-cluster-worker-01 <none> <none>
kube-system cilium-g7whv 1/1 Running 0 46m 10.0.0.2 test-cluster-master-01 <none> <none>
kube-system cilium-k4tww 1/1 Running 0 46m 10.0.0.4 test-cluster-worker-02 <none> <none>
kube-system coredns-f9fd979d6-6l8nx 0/1 Running 0 48m 10.224.0.101 test-cluster-worker-01 <none> <none>
kube-system coredns-f9fd979d6-q7dz4 0/1 Running 0 48m 10.224.1.157 test-cluster-worker-02 <none> <none>
Where you can see:
10.224.0.101 is scheduled on test-cluster-worker-01 which, according to the routes in cloud console, should have 10.224.1.0/24.
10.224.1.157 is scheduled on test-cluster-worker-02 which should have 10.224.2.0/24
Can someone please pinpoint me into the correct direction for resolving this issue?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 40 (12 by maintainers)
@philipp1992 @kiwinesian Native routing saves one layer of tunneling/vxlan. You should probably know why that is an advantage.
@kiwinesian Created a project for this https://github.com/mysticaltech/kube-hetzner, all works well, including full kube-proxy replacement. However, even though everything was setup with cilium for native routing, I had to use
tunnel: genevesee https://github.com/mysticaltech/kube-hetzner/blob/master/manifests/helm/cilium/values.yaml, to make everything really stable, somehow, pure native routing did not make the hetzner csi happy (maybe more debug is needed in the future). The geneve tunnel overhead is really low.So thanks to cilium in combination with Fedora, we now have full BPF support, and full kube-proxy replacement with the improvement that it brings.
If you’re still curious, I solved it too.
This is my working cilium-file with cilium 1.9.5: https://github.com/nupplaphil/hcloud-k8s/blob/stable/roles/kube-master/files/cilium.yaml
But you have to keep an eye on your CIDRs at other places too (as @AlexMe99 already said), like https://github.com/nupplaphil/hcloud-k8s/blob/stable/roles/kube-master/files/hcloud-controller.yaml https://github.com/nupplaphil/hcloud-k8s/blob/f8ee5f18319ad3957a052603c84c4627d23a14e1/roles/kube-master/tasks/tasks.yaml#L6
You can basically do these different flavors:
I’m doing the third variant. You can to that with different plugins, e.g. cilium (native-routing-cidr), flannel (backend type “alloc”), cilium (no IPIP, no vxlan).
ohh thanks for sharing @mysticaltech ! I might find a weekend to spin the cluster up using your configuration 😉
I just managed to get the k8s cluster going, but there are a few things that I would like to validate with you and see if it makes sense/ ideal:
Native-Routing. Instead, I have to setipam=kubernetestunnel=disabledand settingnativeRoutingCIDR=x.x.x.x/8. I ended up leaving it enabled and seems to keeping it happy - even though the reference said to do so https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/docs/deploy_with_networks.md.I’m wondering if you are aware the impact of #1 and #2 leaving it as it is? should I attempt with
tunnel=geneve?Hi @mysticaltech ,
Yes, for sure! Out of curiosity, have look into Calico?
Looks like they also have eBPF in the latest release.
hi @mysticaltech
thanks so much for attending this! interesting part is that Ubuntu 18.04 works completely fine with the iptables, but the
csi-provisioneris crashing in Ubuntu 20.04 (and Debian 10) using the same exact config forcilium.yamlI can try the Fedora 34 one and see if I can get it going. Will have to rewrite the Ansible script to deploy all of this - will report back maybe after the weekend. 😃
@AlexMe99
thanks for the info. could you share the exact settings ? how did you create the hcloud network, which arguments did you pass to k3s master and worker and what cilium deployment did you use?
kind regards Philipp
@ByteAlex I went through this topic when I wanted to init k8s cluster (v.1.20.0) on hetzner-cloud with cilium (1.9.4). I faced similar issues. What solved them for me was (1) taking care of setting the appropriate --node-ip (internal network ip) on each node (master+worker) for the kubelet as start argument (via kubelet.service.d kubelet-extra-arg) and (2) creating a subset of the general hetzner network for the subnet and another subset for the pod- and service-network. Sth like this: Network: 10.0.0.0/8; SUbnet: 10.1.0.0/16; Pod-Net: 10.2.0.0/16; Srv-Net: 10.3.0.0/16. Especially the appropriate setting of the networks was important. I’m not a networker but the masquerading seems to kill each attempt of seperating the subnet and the pod-/service-net into different domains (like 192… or similer).