rancher: RKE INTERNAL-IP and EXTERNAL-IP addresses are not correctly set
What kind of request is this (question/bug/enhancement/feature request): bug.
Steps to reproduce (least amount of steps as possible): Add a node to a RKE cluster as:
rancher_ip_address="${1:-10.1.0.3}"; shift || true
node_ip_address="$rancher_ip_address"
# register this node as a rancher-agent.
echo "getting the rancher-agent registration command..."
cluster_id="$(echo "$cluster_response" | jq -r .id)"
cluster_registration_response="$(
wget -qO- \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $admin_api_token" \
--post-data '{"type":"clusterRegistrationToken","clusterId":"'$cluster_id'"}' \
"$rancher_server_url/v3/clusterregistrationtoken")"
echo "registering this node as a rancher-agent..."
rancher_agent_registration_command="
$(echo "$cluster_registration_response" | jq -r .nodeCommand)
--address $node_ip_address
--internal-address $node_ip_address
--etcd
--controlplane
--worker"
$rancher_agent_registration_command
Result:
The INTERNAL-IP and EXTERNAL-IP and not correctly set as can be seen on the following output:
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
server Ready controlplane,etcd,worker 36m v1.15.3 192.168.121.150 <none> Ubuntu 18.04.3 LTS 4.15.0-58-generic docker://19.3.1
# kubectl describe nodes
Name: server
Roles: controlplane,etcd,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=server
kubernetes.io/os=linux
node-role.kubernetes.io/controlplane=true
node-role.kubernetes.io/etcd=true
node-role.kubernetes.io/worker=true
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"06:0f:17:92:00:ef"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.1.0.3
node.alpha.kubernetes.io/ttl: 0
rke.cattle.io/external-ip: 10.1.0.3
rke.cattle.io/internal-ip: 10.1.0.3
volumes.kubernetes.io/controller-managed-attach-detach: true
Addresses:
InternalIP: 192.168.121.150
Hostname: server
Other details that may be helpful:
This is using a vagrant VM which has two interfaces, eth0 (192.168.121.150) and eth1(10.1.0.3). It should use the eth1(10.1.0.3) ip address as INTERNAL-IP and EXTERNAL-IP addresses.
The vagrant environment is at https://github.com/rgl/rancher-single-node-ubuntu-vagrant.
Environment information
- Rancher version (
rancher/rancher/rancher/serverimage tag or shown bottom left in the UI): 2.2.8 - Installation option (single install/HA): single
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Custom/RKE (as launched by rancher UI)
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM/4core/4GBram
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
- Docker version (use
docker version):
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:05 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:41 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 27
- Comments: 46 (1 by maintainers)
I just ran in to this issue this week and found a work-around. It seems like the main issue is that rancher does not pass the
--node-ipflag to kubelet. If the node IP is not set, then kubelet determines it automatically. All other components (including Rancher itself) grab the IP which is set by Kubelet.The behavior which Kubelet uses to determine the IP can be found here: https://github.com/kubernetes/kubernetes/blob/0e0abd602fac12c4422f8fe89c1f04c34067a76f/pkg/kubelet/nodestatus/setters.go#L214, it boils down to:
So simply adding your desired node IP along with the nodes hostname to
/etc/hostssolves the problem.Facing this issue as well. Internal-IP is set to a public IP address and I cannot get the nodes to communicate over given private IP address. I’ve tried
kubectl edit node <node-id>which doesn’t seem to have any effect.Not stale. Is there still no “proper” way to provision an RKE cluster with dual nics?
I think the point is being missed here and the focus is being pushed onto SSL. Let’s ignore SSL for now as SSL can be achieved multiple ways.
The issue is that the external and internal IPs are set incorrectly so that communication between nodes does not work over a public network.
In conversations with others on the Rancher community we managed to get a working setup (outside of Hetzner) however this involved using a stack where we could remove the public network interface and only have 1, private, interface as eth0.
The setup we’re aiming for here is each node has 2 network interfaces (as provided by hetzner). This can’t really be changed. eth0 is connected to the public WAN. ens11 is the private network between nodes.
We want to be able to secure communication between rancher using a firewall. With autoscaling in place, we don’t know the IP address of each new node, but we do know the subnet of the private lan so we can allow access via that. We also want to avoid traffic going over the WAN entirely.
Any public network access happens via the load balancers which send traffic over the private network (ens11). So we essentially want to disable any communication over eth0.
Now when we launch new nodes with RKE they don’t set the interface / internal network correctly so connections to Rancher server, and various endpoints attempt to use eth0 which fails because the firewall blocks all traffic over the public WAN.
I’ve managed to “fix” the way Flannel extract host IP by passing the below options through
cluster.ymlfile:There’s still a problem with the method used to extract the
Internal IPlabel as seen below:Internal IPis actually the public IP address of the host even though I’ve specified incluster.ymlthe node addresses (address: 10.0.0.5andinternal_address: 10.0.0.5)* I’m trying to configure Kubernetes to communicate between hosts using Hetzner Private networks and blocking all the incoming traffic from the Public IP address. * If it helps, here’s the entire
cluster.ymlfile: https://gist.github.com/iosifnicolae2/87805e421a9faf83ca632825d1d6946bUpdate
I’ve managed to solve the
Internal IPproblem by removinghostname_overridevariable fromcluster.ymlI believe I found a working solution for my problem. Let me explain my setup:
There are three CX31 nodes running at Hetzner Cloud (the type shouldn’t matter, I’m just being thorough). All are provisioned with a combination of Cloud Init and Ansible.
After the initial cloud config has successfully run I further setup each node with Ansibe. Nothing fancy, though. I install Docker with
geerlingguy.docker, setup a few more users. I especially don’t do any firewall tweaking. The only package that Iapt installisopen-iscsi(required for Longhorn).One of the three nodes is hosting the Rancher installation, started by
docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged --name rancher rancher/rancher:latestaccording to the official documentation.I’ve created to networks in Hetzner Cloud named
rancherandk8swith10.2.0.0/24and10.3.0.0/24respectively. All three nodes are attached to both networks. There’s a Load Balancer attached to thek8snetwork. At this point I have created a new cluster and naturally I’ve tried the Canal CNI provider first. I ran into weird issues where requests to a NodePort service failed about 50% of the times. After destroying the cluster, cleaning the nodes (!!) I tried Weave as a CNI provider and it looks like it is running stable and as intended.This is the command that I’ve used to provision the cluster on the two remaining nodes:
For the sake of completeness, this is the Rancher cluster.yaml config:
With this configuration I was able to create a DaemonSet with an NginX image and a Service with a NodePort 30080 that the Load Balancer routes to. Also the deployment of Longhorn went through without any issues (which has failed in the past). The thing is, when I change the CNI from Weave to Canal everything falls apart. So either the default setup for Canal is buggy or missing some essential configuration. 🤷🏻♂️ I’ll keep playing around with my setup and report any oddities here.
@riker09 The interface name is dictated by the VM type. From Heztners docs:
The interface for the first attached network will be named ens10 (for CX, CCX) or enp7s0 (for CPX). Additional interfaces will be named ens11 (CX, CCX) or enp8s0 (CPX) for the second, and ens12 (CX, CCX) or enp9s0 (CPX) for the third.I think we’re on the same page here. 🙂
What is puzzling me is the fact, that this is still unsolved. When I explicitly tell the cluster node upon registration to use the private network interface (here:
ens11) I would expect that Rancher respects that.I don’t have/see any problem with the nodes connecting to the Rancher cluster via its public IP. Does anybody else object to that?
There seems to be no way to convince a RKE cluster to only use specified IPs for a given node if the node’s primary ethernet interface (
eth0or similar) has an IP, whether the IP is public or private.In my use-case working with Hetzner Cloud (or baremetal), the nodes have public IPs and traffic is explicitly blocked on the public interface. All private traffic traverses a secondary VLAN interface with a different IP address.
The workaround:
networkkey as below:eth0primary IP) as--external-addressand secondary IP as--internal-addressHuge caveat with the workaround is that the node still uses
eth0as the node’s IP and there’s no way to tell Rancher to only use the IPs given! Canal/flannel will always lookup the primary interface’s IP and use it if you try spoon-feed anything on the cli. The other approach I’ve seen is to create two additional private interfaces and use their IPs as external/internal to avoid using the primary ethernet interface’s IP.