origin: hostPort iptables rule is lost after node restarts

[provide a description of the issue] hostPort mapped by daemonset will disappear after node (or just docker) restarts. But I am not sure if it is still present in latest version.

Version

[provide output of the openshift version or oc version command] OpenShift origin v3.6.1+008f2d5

Steps To Reproduce
  1. Create a DaemonSet with hostMap, example https://gist.github.com/vfreex/fc768e2ecdd6c18047bb9be5e5e707aa
  2. A iptables rule will be added to the KUNE-HT-* chain of nat table.
  3. Restart docker on a particular node.
Current Result

After several minutes, the hostport on that node will become unreachable and the iptables rule in KUNE-HT-* chain will disappear.

Expected Result

the hostport will be mapped to the new Pod.

Additional Information
  1. If the iptables rule is added to the DOCKER chain, this bug will not happen. Although I don’t know how OpenShift/Kubernetes makes this decision.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 25 (19 by maintainers)

Most upvoted comments

Investigated and was able to reproduce locally (at least a variant of the issue) using the nginx daemonset and restarting docker. Analysis:

  1. when docker is restarted, kubelet notices the sandbox has died and terminates it. That termination clears the hostport chains too
  2. kubelet tries to start a new sandbox, but docker is down and this fails
  3. kubelet tries to start a 3rd sandbox, which works, and gets into the CNI network plugin for setup
  4. hostport rules get added
  5. PLEG requests pod sandbox status, which of course returns an empty IP becuase the sandbox is not ready yet. This state gets cached in the status manager and a SyncPod is queued.
  6. Sandbox creation finishes and the IP is assigned and available
  7. SyncPod() runs using the cached status from step (5)
  8. because none of the sandboxes from the cached status have an IP address, SyncPod() thinks the sandboxes are all dead and starts another one. this of course kills the one from step (6)
  9. repeat

It’s currently unclear what should be done about this; it’s a completely upstream problem. We’ve fixed a number of upstream issues with PLEG status racing with SyncPod in the past.