meshnet-cni: Cluster data plane fails after initial deploy

Conditions:

New kind cluster with kindnet
meshnet-cni @v0.3.0 installed

Intermittently, Pods deployed immediately after meshnet come up with the cluster network unavailable. E.g., kube-prometheus-stack initializes with a Job, but it fails to talk to the API server:

> kubectl -n mimesis-data logs mimesis-mon-mimesis-data-admission-create-cslj5        
W0829 17:07:15.999396       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{"err":"Get \"https://10.96.0.1:443/api/v1/namespaces/mimesis-data/secrets/mimesis-mon-mimesis-data-admission\": dial tcp 10.96.0.1:443: connect: no route to host","level":"fatal","msg":"error getting secret","source":"k8s/k8s.go:109","time":"2021-08-29T17:07:19Z"}

When this condition occurs, it happens with all Pods. I can exec into a Pod and try to ping cluster-cidr addresses and all return no route to host.

I can sometimes kick networking over by generating some external network traffic (e.g., apt-get update from the kindnet pod).

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 18 (18 by maintainers)

Most upvoted comments

thanks for reporting this @Cerebus . Seems like the eth0 interface may have been set up. When this does happen, is this issue persistent or does it only affect Pods that were deployed immediately after? Can you document the steps to reproduce this? and if you happen to catch this again, can you collect the output of ip addr && ip route inside a Pod and journalctl logs from one of the kind nodes (something like docker exec <kind-node-name> journalctl -xn --no-pager)?

networkop on Aug 30, 2021