weave: Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR
What you expected to happen?
To not have to supply a --pod-network-cidr=10.32.0.0/12
command when setting up a weave network when using kubeadm init
. For the weave-net pod to remain stable when adding a node to the cluster.
What happened?
When I setup a k8 cluster using kubeadm init --apiserver-advertise-address=192.168.1.31
and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.
The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.
How to reproduce it?
NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.
- Tear down existing k8 cluster to get to square 1
- drain and delete all nodes
kubeadm reset
on all nodes and master- On master: delete
/etc/cni/net.d
and$HOME/.kube/config
folders.
- On master - run
kubeadm init --apiserver-advertise-address=192.168.1.31
- Run commands the kubeadm says to run at the end to sset up the kubeconfig correctly (mkdir…)
- Run
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
to deploy weave
- Wait for all pods to correctly come online
- Add one node to the cluster with join cmd in the kubeadm output from the master.
- On master - run
kubeadm get pods --all-namespaces
At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.
Anything else we need to know?
I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.
NOTE - adding --pod-network-cidr=10.32.0.0/12
to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.
I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.
One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.
Versions:
KubeCtl:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Weave
Using Weaving CNI plugin for Kubernetes
Docker:
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.1
Git commit: 2d0083d
Built: Fri Aug 16 14:20:06 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.09.7
API version: 1.39 (minimum version 1.12)
Go version: go1.10.1
Git commit: 2d0083d
Built: Wed Aug 14 19:41:23 2019
OS/Arch: linux/amd64
Experimental: false
uname -a
Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Logs:
The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a
And the output once I added the one node and started seeing the crashing net pod:
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a
And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.32.0.0/12
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:27:20Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "242"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 10ce9974-f193-4b5c-9efb-78c0317746d2
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 16 (4 by maintainers)
No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok.
I have a firewall, but I don’t think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs.
Plus, all this works if I supply the CIDR command when running the initial init command. So I don’t need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.
When you specific
--pod-network-cidr=10.32.0.0/12
tokubeamd init
which will result in passing the specified CIDR to kube-proxy. Which will help kube-proxy to know what is internal and external traffic. That should not in anyway will cause weave-net pods or any pods.Plese check the logs why the second container which is weave-npc is crashing for you.