weave: Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR

What you expected to happen?

To not have to supply a --pod-network-cidr=10.32.0.0/12 command when setting up a weave network when using kubeadm init. For the weave-net pod to remain stable when adding a node to the cluster.

What happened?

When I setup a k8 cluster using kubeadm init --apiserver-advertise-address=192.168.1.31 and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.

The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.

How to reproduce it?

NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.

Tear down existing k8 cluster to get to square 1
- drain and delete all nodes
- kubeadm reset on all nodes and master
- On master: delete /etc/cni/net.d and $HOME/.kube/config folders.
On master - run kubeadm init --apiserver-advertise-address=192.168.1.31
- Run commands the kubeadm says to run at the end to sset up the kubeconfig correctly (mkdir…)
- Run kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" to deploy weave
Wait for all pods to correctly come online
Add one node to the cluster with join cmd in the kubeadm output from the master.
On master - run kubeadm get pods --all-namespaces

At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.

Anything else we need to know?

I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.

NOTE - adding --pod-network-cidr=10.32.0.0/12 to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.

I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.

One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.

Versions:

KubeCtl:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Weave

Using Weaving CNI plugin for Kubernetes

Docker:

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false

uname -a

Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Logs:

The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And the output once I added the one node and started seeing the crashing net pod:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.32.0.0/12
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:27:20Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "242"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 10ce9974-f193-4b5c-9efb-78c0317746d2

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 1
Comments: 16 (4 by maintainers)

Most upvoted comments

No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok.

I have a firewall, but I don’t think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs.

Plus, all this works if I supply the CIDR command when running the initial init command. So I don’t need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

ryan-g2 on Feb 2, 2020

When you specific --pod-network-cidr=10.32.0.0/12 to kubeamd init which will result in passing the specified CIDR to kube-proxy. Which will help kube-proxy to know what is internal and external traffic. That should not in anyway will cause weave-net pods or any pods.

Plese check the logs why the second container which is weave-npc is crashing for you.

murali-reddy on Jan 24, 2020