kubernetes: DualStack: Ip family of status.podIP can be wrong depending on CNI-plugin
Ip family of status.podIP can be wrong depending on CNI-plugin
What happened:
The status.podIP
is taken from status.podIPs[0]
without regard for the main family of the cluster. No warning or error is given if status.podIP
gets the wrong family.
What you expected to happen:
That status.podIP
is taken from the “main” ip family, or at the very least the POD is not started and an error is given if it’s not.
How to reproduce it (as minimally and precisely as possible):
Use the bridge
CNI-plugin and host-local
ipam. Use a single node cluster to avoid that host-local
assign the same address to PODs in different nodes. This may be CRI-plugin dependent, I use cri-o/1.18.3
.
$ cat /etc/cni/net.d/10-bridge.conf
{
"cniVersion": "0.4.0",
"name": "cni-x",
"type": "bridge",
"bridge": "cbr0",
"isDefaultGateway": true,
"hairpinMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[ { "subnet": "1100::100/120" } ],
[ { "subnet": "11.0.1.0/24" } ]
]
}
}
$ ls /opt/cni/bin/
bridge* host-local* loopback*
The PODs are assigned dual addresses which has been supported since v1.9. Note the order; IPv6 first.
In a dual-stack or a single-stack cluster with main family IPv4 status.podIP
will get an IPv6 address even if the main family is IPv4;
$ kubectl get pod alpine-daemonset-4t6w5 -o json | jq .status.podIP
"1100::102"
Anything else we need to know?:
First, note that this is not a dual-stack problem. status.podIP
may get the wrong family in a single-stack cluster if the PODs has dual addresses. The bug may however have been introduced with the dual-stack support, before status.podIPs
K8s may have selected the correct family, I have not checked that.
This is reported and discussed in https://github.com/kubernetes/kubernetes/issues/94505. There it is considered as a CNI-plugin problem, the CNI-plugin must present addresses in the order K8s wants them. Problem is that CNI-plugins doesn’t know this.
The addresses takes the path;
CNI-plugin -> CRI-plugin -> kubelet -> API-server
The problem can be addressed in either of these places.
The current situation is that the full responsibility lies on the CNI-plugin
, it must send the addresses in the “correct order”. This is not the best place. The CNI-plugins are not a K8s-only thing and to require that a “prefered-family” or something similar must be supported by CNI-plugins to be “Kubernetes compliant” should be avoided.
The CRI-plugin is a K8s-only thing and a “prefered-family” can be introduced so the CRI-plugin can sort the address array before sending it to kubelet
. This is however undesirable since it adds a configuration complexity, it put a responsibility on the user or installation tool to configure the “prefered-family” to all CNI-plugins now and in the future.
The best is to handle this in K8s itself where the “main” family is known.
// IP addresses allocated to the pod. This list
// is inclusive, i.e. it includes the default IP address stored in the
// "PodIP" field, and this default IP address must be recorded in the
// 0th entry (PodIPs[0]) of the slice. The list is empty if no IPs have
// been allocated yet.
PodIPs []PodIP `json:"podIPs,omitempty" protobuf:"bytes,6,opt,name=podIPs"`
Is this really necessary? All communication works even if podIPs[0]
is not of the main family. EndpointSlices seem to get this right regardless of order. Endpoint on the other hand seem to take the (old) podIP. So perhaps;
The "PodIP" field is set to the first address in podIPs that matches the main ip-family.
is sufficient?
If podIP
really must be podIPs[0]
, then K8s should sort the array so podIPs[0]
belongs to the main family.
The problem exist on all supported K8s versions and on “master”.
Current situation for dual-stack supporting CNI-plugins
If installed in the default way the situation is;
- Calico always send IPv4 first
- Cilum always send IPv6 first
I don’t think you can control the order, but I have not asked.
Environment:
- Kubernetes versions: v1.17.12, v1.18.9, v1.19.2, v1.20.0-alpha.1, master v1.20.0-alpha.1.257+112dbd55860e60
- Cloud provider or hardware configuration: None
- OS (e.g:
cat /etc/os-release
): xcluster - Kernel: linux-5.8.1
- Install tools: None
- Network plugin and version: bridge, host-local v0.8.7
- Others: CRI-plugin: cri-o 1.18.3
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 36 (34 by maintainers)
So, to sum up:
pod.Spec.PodIP
, regardless of any other cluster configuration.pod.Spec.PodIP
that is not the IP family the admin considers the cluster’s primary IP familykubectl get
” may not be the ones the administrator wants--node-ip
should also affect the sorting of pod IPs, because really why wouldn’t you want that?oh, and:
podIPs
, but the apiserver will only acceptpodIPs
if it is either a single IP or a dual-stack pair.The reason podIP must match podIPs[0] is related to updates to an existing object by an old client. See https://github.com/kubernetes/kubernetes/pull/88505 for a related problem when this was not done.
podIPs=[podIP]
. That means that new clients (aware of both fields) must set podIP and podIPs[0] to match.