kubernetes: endpoints for daemonset in host network not ready and inconsistent with pod IPs
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: After deploying the prometheus node-exporter as daemonset + service with these manifests: https://gist.github.com/discordianfish/77b1e4f82ebe966ec5ce341aa3366155
…the node-exporter gets deployed to all nodes as expected:
$ kubectl get pod -l app=node-exporter -o json|jq '.items[]|{"name": .metadata.name, "ip": .status.podIP}'|jq -s 'sort_by(.ip)'
[
{
"name": "node-exporter-2b14f",
"ip": "10.32.130.2"
},
{
"name": "node-exporter-nkqhr",
"ip": "10.32.130.3"
},
{
"name": "node-exporter-7w9pq",
"ip": "10.32.130.4"
},
{
"name": "node-exporter-j3hfm",
"ip": "10.32.130.5"
},
{
"name": "node-exporter-pp1sq",
"ip": "10.32.130.6"
}
]
The pod IPs match the host ips, which I assume is expected given the node-exporter runs in host networking namespace:
$ kubectl get nodes -o json| jq '.items[].status.addresses[]|select(.type == "InternalIP")'
{
"address": "10.32.130.6",
"type": "InternalIP"
}
{
"address": "10.32.130.5",
"type": "InternalIP"
}
{
"address": "10.32.130.4",
"type": "InternalIP"
}
{
"address": "10.32.130.2",
"type": "InternalIP"
}
{
"address": "10.32.130.3",
"type": "InternalIP"
}
So far so good. The problem is that all but one endpoints are ‘not ready’ and they are all inconsistent with their pod IP:
$ kubectl get endpoints -l app=node-exporter -o json | jq '.items[0].subsets[0].addresses[]|{"name": .targetRef.name, "ip": .ip}'
{
"name": "node-exporter-pp1sq",
"ip": "10.32.130.6"
}
$ kubectl get endpoints -l app=node-exporter -o json | jq '.items[0].subsets[0].notReadyAddresses[]|{"name": .targetRef.name, "ip": .ip}'
{
"name": "node-exporter-2b14f",
"ip": "10.32.130.3"
}
{
"name": "node-exporter-nkqhr",
"ip": "10.32.130.4"
}
{
"name": "node-exporter-j3hfm",
"ip": "10.32.130.5"
}
{
"name": "node-exporter-7w9pq",
"ip": "10.32.130.7"
}
This leads prometheus to miss one instance and apply wrong labels. This could lead to misinterpretation of monitoring results in this particular case and probably even catastrophic failure in other cases.
What you expected to happen: I expected the endpoint IP match the pod_ip.
How to reproduce it (as minimally and precisely as possible): This is a pretty much vanilla GKE cluster with https://gist.github.com/discordianfish/77b1e4f82ebe966ec5ce341aa3366155 applied.
Anything else we need to know?: The node-exporter is running in host networking and pid namespace.
Environment:
- Kubernetes version (use
kubectl version
): 1.6.4 - Cloud provider or hardware configuration**: GKE
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 32 (25 by maintainers)
I have a theory here. This bug happens in the following conditions:
Validation for UpdateEndpoints uses EndpointAddress IP as key and check if its node has changed. https://github.com/kubernetes/kubernetes/blob/v1.9.0-beta.0/pkg/apis/core/validation/validation.go#L4302
For example, something like this may happen: 1.Originally, Node1 has IP1 and Node2 has IP2. Pod1 on Node1 with hostnetwork has IP1 and Pod2 on Node2 has IP2. 2. Due to some disruption, instance internal IPs are changed. Node1 gets IP2 and Node2 gets IP1. That means Pod1 gets IP2 and Pod2 gets IP1. 3. k8s endpoint controller will try to reconcile the existing endpoints object. And swap the EndpointAddress of Pod1 and Pod2. However, the validation logic uses the IP as key and checks if corresponding node name changed. Then the endpoints object end up stucking in this error state.
I’ve migrated off from GKE, so not sure if it’s still an issue. I haven’t observed it after reporting it here initially though.