ovn-kubernetes: Connectivity problems with overlay network on windows
I have a connectivity problem with one of my windows nodes and the error messages are not helping me to diagnose the problem.
The following steps are currently working:
- The
ovn-kubernetes-nodes
service is running. - The
ovn-controller
service is running. - The
ovsdb-server
andovs-vswitchd
service are also running. - The
kubelet
service is running as well.
Kubectl shows the the node has joined the cluster and it is in status ready. I can also deploy pods to the node. The pods are however not being started.
The integration bridge k8s-***
has been created and has been assigned an overlay IP (e.g. 10.254.10.2
).
I can ping the following IPs:
- 10.254.10.2 (integration bridge)
- 10.254.10.1 (gateway ip)
- 192.168.254.9 (host network IP)
- 192.168.254.1 (default gateway and next hop IP for ovn-kubernetes-node)
- IP of kubernetes master
- Host IP of other Windows and Linux k8s nodes.
I can also ping the gateway ip of other nodes for example 10.245.0.1
, 10.245.1.1
, but the ping times are less than a 1ms which is impossibly fast so I do not think the ping is actually being relayed to these nodes.
I cannot ping the integration bridges of other linux/windows nodes (e.g. 10.245.0.2
or 10.245.1.2
) and there is no connectivity to pods on other nodes.
The ovn-kubernetes-node.log
is full of failing ADD/DEL operations and I constantly see these messages (https://gist.github.com/lanoxx/d3e9090187eda8b193665de6b3119f7b).
I have also already disabled the windows firewall.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 25 (19 by maintainers)
Commits related to this issue
- Merge pull request #683 from alexanderConstantinescu/bug/1997104 [release-4.7] Bug 1997104: fix reserve joinSwitch LRP IPs — committed to astoycos/ovn-kubernetes by openshift-merge-robot 3 years ago
FYI, just a terminology correction: “br-int” is the integration bridge. k8s-** is the “node mgmt interface”.
Few things to check.
Why aren;t the pods being started? What is the error message for that? Both in
kubectl describe
and ovnkube log for the node where pod is being deployed. Also ovnkube log for the master node - is it creating annotations with IP/MAC information for that pod?ovs-ofctl dump-flows br-int
should show a lot of flows - ensures that there is connectivity between the central databases and local OVS.ovs-vsctl show
shows “geneve” tunnels to other nodes.ovs-dpctl show
shows a geneve tunnel.When you ping from the k8s-** of this machine to another machine, do a
ovs-dpctl dump-flows
. This shows the kernel real time flows for a real packet. If the only traffic you see is the ping, there should not be many flows and we can see whether it is getting dropped.