flannel: 60+ seconds stuck when call a http service pod
when flanneld version upgrading to v0.20.1 and curl http service pod in different node via ClusterIP will stuck 60+ seconds.
Expected Behavior
no stuck
Current Behavior
stuck 60+ seconds
Possible Solution
eh… may be caused by double-NAT, i have no idea
Steps to Reproduce (for bugs)
it will stuck curl when nat POSTROUTING order like this:
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
FLANNEL-POSTRTG all -- anywhere anywhere /* flanneld masq */
KUBE-POSTROUTING all -- anywhere anywhere /* kubernetes postrouting rules */
it works fine like this:
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
KUBE-POSTROUTING all -- anywhere anywhere /* kubernetes postrouting rules */
FLANNEL-POSTRTG all -- anywhere anywhere /* flanneld masq */
Context
this pr(https://github.com/kubernetes/kubernetes/pull/92035) looks like to solve this issue, but I still have this problem when I use flanneld v0.20.1
Your Environment
- Flannel version: v0.20.1
- Backend used (e.g. vxlan or udp): vxlan
- Etcd version: 3.5.3
- Kubernetes version (if used): v1.25.4
- Operating System and version: Archlinux (kernel version 6.0.8)
- Link to your project (optional):
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 18 (9 by maintainers)
You can increase the verbosity of iptables output if you use
-vL.@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:
And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting
Ok now it’s clear. This bug is only happening on some kernel versions, that’s why it wasn’t happening on my setup. I’ll try to update the iptables rules.
@rbrtbnfgl pod to pod via service ip may works fine, but node to pod via service ip will be stuck