weave: Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR

What you expected to happen?

To not have to supply a --pod-network-cidr=10.32.0.0/12 command when setting up a weave network when using kubeadm init. For the weave-net pod to remain stable when adding a node to the cluster.

What happened?

When I setup a k8 cluster using kubeadm init --apiserver-advertise-address=192.168.1.31 and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.

The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.

How to reproduce it?

NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.

  1. Tear down existing k8 cluster to get to square 1
    • drain and delete all nodes
    • kubeadm reset on all nodes and master
    • On master: delete /etc/cni/net.d and $HOME/.kube/config folders.
  2. On master - run kubeadm init --apiserver-advertise-address=192.168.1.31
    • Run commands the kubeadm says to run at the end to sset up the kubeconfig correctly (mkdir…)
    • Run kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" to deploy weave
  3. Wait for all pods to correctly come online
  4. Add one node to the cluster with join cmd in the kubeadm output from the master.
  5. On master - run kubeadm get pods --all-namespaces

At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.

Anything else we need to know?

I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.

NOTE - adding --pod-network-cidr=10.32.0.0/12 to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.

I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.

One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.

Versions:

KubeCtl:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Weave

Using Weaving CNI plugin for Kubernetes

Docker:

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false

uname -a

Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Logs:

The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And the output once I added the one node and started seeing the crashing net pod:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.32.0.0/12
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:27:20Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "242"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 10ce9974-f193-4b5c-9efb-78c0317746d2

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (4 by maintainers)

Most upvoted comments

No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok.

I have a firewall, but I don’t think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs.

Plus, all this works if I supply the CIDR command when running the initial init command. So I don’t need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

When you specific --pod-network-cidr=10.32.0.0/12 to kubeamd init which will result in passing the specified CIDR to kube-proxy. Which will help kube-proxy to know what is internal and external traffic. That should not in anyway will cause weave-net pods or any pods.

Plese check the logs why the second container which is weave-npc is crashing for you.