kubernetes: endpoints for daemonset in host network not ready and inconsistent with pod IPs

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: After deploying the prometheus node-exporter as daemonset + service with these manifests: https://gist.github.com/discordianfish/77b1e4f82ebe966ec5ce341aa3366155

…the node-exporter gets deployed to all nodes as expected:

$ kubectl get pod -l app=node-exporter -o json|jq '.items[]|{"name": .metadata.name, "ip": .status.podIP}'|jq -s 'sort_by(.ip)'
[
  {
    "name": "node-exporter-2b14f",
    "ip": "10.32.130.2"
  },
  {
    "name": "node-exporter-nkqhr",
    "ip": "10.32.130.3"
  },
  {
    "name": "node-exporter-7w9pq",
    "ip": "10.32.130.4"
  },
  {
    "name": "node-exporter-j3hfm",
    "ip": "10.32.130.5"
  },
  {
    "name": "node-exporter-pp1sq",
    "ip": "10.32.130.6"
  }
]

The pod IPs match the host ips, which I assume is expected given the node-exporter runs in host networking namespace:

$ kubectl get nodes -o json| jq '.items[].status.addresses[]|select(.type == "InternalIP")'
{
  "address": "10.32.130.6",
  "type": "InternalIP"
}
{
  "address": "10.32.130.5",
  "type": "InternalIP"
}
{
  "address": "10.32.130.4",
  "type": "InternalIP"
}
{
  "address": "10.32.130.2",
  "type": "InternalIP"
}
{
  "address": "10.32.130.3",
  "type": "InternalIP"
}

So far so good. The problem is that all but one endpoints are ‘not ready’ and they are all inconsistent with their pod IP:

$ kubectl get endpoints -l app=node-exporter -o json | jq '.items[0].subsets[0].addresses[]|{"name": .targetRef.name, "ip": .ip}'
{
  "name": "node-exporter-pp1sq",
  "ip": "10.32.130.6"
}
$ kubectl get endpoints -l app=node-exporter -o json | jq '.items[0].subsets[0].notReadyAddresses[]|{"name": .targetRef.name, "ip": .ip}'
{
  "name": "node-exporter-2b14f",
  "ip": "10.32.130.3"
}
{
  "name": "node-exporter-nkqhr",
  "ip": "10.32.130.4"
}
{
  "name": "node-exporter-j3hfm",
  "ip": "10.32.130.5"
}
{
  "name": "node-exporter-7w9pq",
  "ip": "10.32.130.7"
}

This leads prometheus to miss one instance and apply wrong labels. This could lead to misinterpretation of monitoring results in this particular case and probably even catastrophic failure in other cases.

What you expected to happen: I expected the endpoint IP match the pod_ip.

How to reproduce it (as minimally and precisely as possible): This is a pretty much vanilla GKE cluster with https://gist.github.com/discordianfish/77b1e4f82ebe966ec5ce341aa3366155 applied.

Anything else we need to know?: The node-exporter is running in host networking and pid namespace.

Environment:

Kubernetes version (use kubectl version): 1.6.4
Cloud provider or hardware configuration**: GKE

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 32 (25 by maintainers)

Most upvoted comments

I have a theory here. This bug happens in the following conditions:

k8s cluster has service pointing to pods using host network. Pods could be managed by daemonset.
Instance internal IP changed due to whatever reason. (such as preemptive instance, obnormal shutdown)

Validation for UpdateEndpoints uses EndpointAddress IP as key and check if its node has changed. https://github.com/kubernetes/kubernetes/blob/v1.9.0-beta.0/pkg/apis/core/validation/validation.go#L4302

For example, something like this may happen: 1.Originally, Node1 has IP1 and Node2 has IP2. Pod1 on Node1 with hostnetwork has IP1 and Pod2 on Node2 has IP2. 2. Due to some disruption, instance internal IPs are changed. Node1 gets IP2 and Node2 gets IP1. That means Pod1 gets IP2 and Pod2 gets IP1. 3. k8s endpoint controller will try to reconcile the existing endpoints object. And swap the EndpointAddress of Pod1 and Pod2. However, the validation logic uses the IP as key and checks if corresponding node name changed. Then the endpoints object end up stucking in this error state.

freehan on Dec 8, 2017

I’ve migrated off from GKE, so not sure if it’s still an issue. I haven’t observed it after reporting it here initially though.

discordianfish on Oct 11, 2018