kube-router: Sporadic Connection Refused for services on Network Policy sync
Some time ago I informally raised on slack that there might be an issue with the NetworkPolicyController causing intermittent connection failures while reconciling. I can now confirm that this happens, although it can be hard to re-create.
I think I narrowed it down to this chunk of code in network_policy_controller.go
:
// TODO use iptables-restore to better implement the logic, than flush and add rules
err = iptablesCmdHandler.ClearChain("filter", policyChainName)
if err != nil && err.(*iptables.Error).ExitStatus() != 1 {
return nil, nil, fmt.Errorf("Failed to run iptables command: %s", err.Error())
}
err = npc.processIngressRules(policy, targetDestPodIpSetName, activePolicyIpSets)
if err != nil {
return nil, nil, err
}
err = npc.processEgressRules(policy, targetSourcePodIpSetName, activePolicyIpSets)
if err != nil {
return nil, nil, err
}
I tried commenting out the ClearChain line and made sure that ingress/egress rules were only processed once (on kube-router startup) and now I couldn’t re-produce the problem.
My guess is, that in some cases, incoming packets are dropped due to the time gap between chain clearing and chain re-building, although I’m not entirely sure. For sure is though, that I get intermittent Connection Refused issues when talking to services and they always occur at the exact time of network policy Sync().
Reproduce
To reproduce the error, I run something like:
time while true ; do curl -s -o /dev/null http://myservice.default.svc.cluster.local/api || break ; sleep 0.1 ; done
hint: set --iptables-sync-period
low, or apply some continuous changes to the cluster (triggering sync) in order not to wait forever to see the error. Perhaps you need to have a certain amount of network policies as well, in order for the gap between chain flush and rule creation to be noticeable.
Suggestions
// TODO use iptables-restore to better implement the logic, than flush and add rules
a fix is already suggested in the above comment ^^. Although I don’t know if this would provide better atomicity:
- Building a new chain
- Changing the chain reference
- (defer) Deleting the old chain
For the sake of atomicity/consistency between different NW policies. E.g. When service A talks to service B, A having an egress rule and B the corresponding ingress rule; perhaps it’s better to rebuild all chains in one loop and afterwards changing all references in a second loop?
Of course best of all would be to not at all touch what haven’t been changed. Perhaps iptables-restore can help on that.
Let me hear your thoughts?
also, Nitpick:
glog.V(1).Info("Starting periodic sync of iptables")
… should probably just be Starting sync of iptables
, since this function is invoked both for APIserver events and for the periodic reconcile.
Environment
Kubernetes: v1.9.6 Kube-router: v0.2.0-beta.6
Below is a slightly redacted dump of iptables -t filter -L
from the node running the pod I test against:
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-POD-FW-2T5M5FYYBTRHPIYV all -- 10.233.6.51 anywhere PHYSDEV match --physdev-is-bridged /* rule to jump traffic from POD name:node-exporter-m4t4m namespace: monitoring to chain KUBE-POD-FW-2T5M5FYYBTRHPIYV */
KUBE-POD-FW-2T5M5FYYBTRHPIYV all -- 10.233.6.51 anywhere /* rule to jump traffic from POD name:node-exporter-m4t4m namespace: monitoring to chain KUBE-POD-FW-2T5M5FYYBTRHPIYV */
KUBE-POD-FW-5HZOC6QEWDN2OQQY all -- 10.233.6.3 anywhere PHYSDEV match --physdev-is-bridged /* rule to jump traffic from POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit to chain KUBE-POD-FW-5HZOC6QEWDN2OQQY */
KUBE-POD-FW-5HZOC6QEWDN2OQQY all -- 10.233.6.3 anywhere /* rule to jump traffic from POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit to chain KUBE-POD-FW-5HZOC6QEWDN2OQQY */
KUBE-POD-FW-6Q423ANJ5FJYSYLE all -- 10.233.6.50 anywhere PHYSDEV match --physdev-is-bridged /* rule to jump traffic from POD name:filebeat-pxzfh namespace: platform to chain KUBE-POD-FW-6Q423ANJ5FJYSYLE */
KUBE-POD-FW-6Q423ANJ5FJYSYLE all -- 10.233.6.50 anywhere /* rule to jump traffic from POD name:filebeat-pxzfh namespace: platform to chain KUBE-POD-FW-6Q423ANJ5FJYSYLE */
KUBE-POD-FW-5HZOC6QEWDN2OQQY all -- anywhere 10.233.6.3 PHYSDEV match --physdev-is-bridged /* rule to jump traffic destined to POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit to chain KUBE-POD-FW-5HZOC6QEWDN2OQQY */
KUBE-POD-FW-5HZOC6QEWDN2OQQY all -- anywhere 10.233.6.3 /* rule to jump traffic destined to POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit to chain KUBE-POD-FW-5HZOC6QEWDN2OQQY */
KUBE-POD-FW-6Q423ANJ5FJYSYLE all -- anywhere 10.233.6.50 PHYSDEV match --physdev-is-bridged /* rule to jump traffic destined to POD name:filebeat-pxzfh namespace: platform to chain KUBE-POD-FW-6Q423ANJ5FJYSYLE */
KUBE-POD-FW-6Q423ANJ5FJYSYLE all -- anywhere 10.233.6.50 /* rule to jump traffic destined to POD name:filebeat-pxzfh namespace: platform to chain KUBE-POD-FW-6Q423ANJ5FJYSYLE */
KUBE-POD-FW-2T5M5FYYBTRHPIYV all -- anywhere 10.233.6.51 PHYSDEV match --physdev-is-bridged /* rule to jump traffic destined to POD name:node-exporter-m4t4m namespace: monitoring to chain KUBE-POD-FW-2T5M5FYYBTRHPIYV */
KUBE-POD-FW-2T5M5FYYBTRHPIYV all -- anywhere 10.233.6.51 /* rule to jump traffic destined to POD name:node-exporter-m4t4m namespace: monitoring to chain KUBE-POD-FW-2T5M5FYYBTRHPIYV */
ACCEPT all -- anywhere anywhere /* allow outbound traffic from pods */
ACCEPT all -- anywhere anywhere /* allow inbound traffic to pods */
ACCEPT all -- anywhere anywhere /* allow outbound node port traffic on node interface with which node ip is associated */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- anywhere anywhere
KUBE-POD-FW-5HZOC6QEWDN2OQQY all -- anywhere 10.233.6.3 /* rule to jump traffic destined to POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit to chain KUBE-POD-FW-5HZOC6QEWDN2OQQY */
KUBE-POD-FW-6Q423ANJ5FJYSYLE all -- anywhere 10.233.6.50 /* rule to jump traffic destined to POD name:filebeat-pxzfh namespace: platform to chain KUBE-POD-FW-6Q423ANJ5FJYSYLE */
KUBE-POD-FW-2T5M5FYYBTRHPIYV all -- anywhere 10.233.6.51 /* rule to jump traffic destined to POD name:node-exporter-m4t4m namespace: monitoring to chain KUBE-POD-FW-2T5M5FYYBTRHPIYV */
Chain DOCKER-USER (0 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-NWPLCY-AJ4HEMWOQH4WRYTX (1 references)
target prot opt source destination
Chain KUBE-NWPLCY-GIIJQFEHBJRDJKP7 (1 references)
target prot opt source destination
Chain KUBE-NWPLCY-JZALGAZ2SIGPNZPY (1 references)
target prot opt source destination
Chain KUBE-NWPLCY-R3ES7SXPUHQHIML7 (1 references)
target prot opt source destination
Chain KUBE-NWPLCY-VYCJOSP4PDSOS2DK (1 references)
target prot opt source destination
Chain KUBE-NWPLCY-XTNPBDVHIRENHZ3T (1 references)
target prot opt source destination
Chain KUBE-POD-FW-2T5M5FYYBTRHPIYV (5 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED
KUBE-NWPLCY-R3ES7SXPUHQHIML7 all -- anywhere anywhere /* run through nw policy denyall */
ACCEPT all -- anywhere 10.233.6.51 /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL
REJECT all -- anywhere anywhere /* default rule to REJECT traffic destined for POD name:node-exporter-m4t4m namespace: monitoring */ reject-with icmp-port-unreachable
Chain KUBE-POD-FW-5HZOC6QEWDN2OQQY (5 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED
KUBE-NWPLCY-AJ4HEMWOQH4WRYTX all -- anywhere anywhere /* run through nw policy outgoing */
KUBE-NWPLCY-JZALGAZ2SIGPNZPY all -- anywhere anywhere /* run through nw policy outgoing-ors */
KUBE-NWPLCY-XTNPBDVHIRENHZ3T all -- anywhere anywhere /* run through nw policy denyall */
KUBE-NWPLCY-VYCJOSP4PDSOS2DK all -- anywhere anywhere /* run through nw policy payara */
ACCEPT all -- anywhere 10.233.6.3 /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL
REJECT all -- anywhere anywhere /* default rule to REJECT traffic destined for POD name:ors-maintenance-65bf64cf67-5v6c9 namespace: iscrum-dit */ reject-with icmp-port-unreachable
Chain KUBE-POD-FW-6Q423ANJ5FJYSYLE (5 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED
KUBE-NWPLCY-GIIJQFEHBJRDJKP7 all -- anywhere anywhere /* run through nw policy denyall */
ACCEPT all -- anywhere 10.233.6.50 /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL
REJECT all -- anywhere anywhere /* default rule to REJECT traffic destined for POD name:filebeat-pxzfh namespace: platform */ reject-with icmp-port-unreachable
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (15 by maintainers)
Commits related to this issue
- Created patch for kuber-router connection refused issue - Instead of clearing the iptables firewall chains for each resync, new chains are now generated side-by-side with the existing ones. - Chain n... — committed to johanot/kube-router by deleted user 6 years ago
- Fix for network policy connection refused issue (#461) * Instead of clearing the iptables firewall chains for each resync, new chains are now generated side-by-side with the existing ones. * Chain n... — committed to johanot/kube-router by deleted user 6 years ago
- Fix for network policy connection refused issue (#461) (#471) * Instead of clearing the iptables firewall chains for each resync, new chains are now generated side-by-side with the existing ones. ... — committed to cloudnativelabs/kube-router by johanot 6 years ago
Sure. We will do a release in the week end.