kubernetes: weave-net CrashLoopBackOff for the second node
Is this a request for help?
I think it is an issue either with software or documentation, but I am not quite sure. I have started with a question on stackoverflow: http://stackoverflow.com/questions/39872332/how-to-fix-weave-net-crashloopbackoff-for-the-second-node
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
I think it is a bug or request to improve documentation
Kubernetes version (use kubectl version):
1.4.0
Environment:
- Cloud provider or hardware configuration: Vagrant
- OS (e.g. from /etc/os-release): Ubuntu 16.04
- Kernel (e.g.
uname -a): - Install tools: kubeadm init/join
- Others:
What happened:
I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:
vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-vm-master 1/1 Running 0 3m
kube-system kube-apiserver-vm-master 1/1 Running 0 5m
kube-system kube-controller-manager-vm-master 1/1 Running 0 4m
kube-system kube-discovery-982812725-x2j8y 1/1 Running 0 4m
kube-system kube-dns-2247936740-5pu0l 3/3 Running 0 4m
kube-system kube-proxy-amd64-ail86 1/1 Running 0 4m
kube-system kube-proxy-amd64-oxxnc 1/1 Running 0 2m
kube-system kube-scheduler-vm-master 1/1 Running 0 4m
kube-system kubernetes-dashboard-1655269645-0swts 1/1 Running 0 4m
kube-system weave-net-7euqt 2/2 Running 0 4m
kube-system weave-net-baao6 1/2 CrashLoopBackOff 2 2m
CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question on stackoverflow, where the answer advised to look into the logs and no follow up. So, here are the logs:
vagrant@vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers
What you expected to happen:
I would expect the weave-net to be in Running state
How to reproduce it (as minimally and precisely as possible):
I have not done anything special, just followed the documentation on Getting Started. If it is essencial, I can share Vagrant project, which I used to provision everything. Please, let me know if you need one.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 24
- Comments: 56 (27 by maintainers)
Commits related to this issue
- Merge pull request #34607 from errordeveloper/apiserver-adv-addr Automatic merge from submit-queue Append first address from `--api-advertise-addresses` to `kube-apiserver` flags **What this PR doe... — committed to kubernetes/kubernetes by deleted user 8 years ago
- daemonset component label for kube-proxy is actually kube-proxy and not kube-proxy-amd64. The last patch for issue https://github.com/kubernetes/kubernetes/issues/34101 is broken. This fix re-establis... — committed to ramukima/k8s-playground by ramukima 8 years ago
- Fixing a network problem that prevents skyDNS pod from starting I ran across a similar problem to the one I was having in getting skyDNS to run properly. It was at: https://github.com/kubernetes/kube... — committed to willauld/HA-kube-vagrant by willauld 7 years ago
As this thread is getting quite noisy, here is a recap.
First, find out what IP address you want to use on the master, it’s probably the one on the second network interface. For this example I’ll use
IP="172.42.42.1".Next, run
kubeadm init --api-advertise-addresses=$IP.Now, you want to append
--advertise-addresstokube-apiserverin the static pod manifest, you can do it like this:And finally, you need to update flags in
kube-proxydaemonset and append--proxy-mode=userspace, which can be done like this:It seems Calico has got similar issue:
What could I try to progress this issue further?
I am starting kubeadm with --advertise-address option: kubeadm init --api-advertise-addresses=$master_address where my $master_address is not NAT interface. It still not enough to remove this issue.
I’m also experiencing the same issue:
Any ideas on how to deal with the issue ?
i am having the same issue as @avkonst. If you leave validation on, you get this error.
error validating "STDIN": error validating data: items[0].apiVersion not set;i included the input fromkubectl -n kube-system get ds -l 'component=kube-proxy-amd64' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--proxy-mode=userspace"]'and both kind and apiVersion are clearly there.{ "kind": "List", "apiVersion": "v1", "metadata": {}, "items": [ { "spec": { "template": { "spec": { "containers": [ { "command": [ "--cluster-cidr=10.32.0.0/12" ] } ] } } } } ] }kubectl apply -f -it seems to not get the outermost portion of the input which is where the kind and apiVersion are both at.Ok, so it turns out that this flag is not enough, we still have an issue reaching
kubernetesservice IP. The simplest solution to this is to runkube-proxywith--proxy-mode=userspace. To enable this, you can usekubectl -n kube-system edit ds kube-proxy-amd64 && kubectl -n kube-system delete pods -l name=kube-proxy-amd64.Okay I ran into this exact same issue and here is how I fixed it.
This problem seems to be due to kube-proxy looking at the wrong network interface. If you look at the kube-proxy logs on a worker node you will most likely see something like:
This is the wrong network interface. The kube-proxy should be looking at the master node’s IP address not the NAT IP address.
As far as I know the kube-proxy gets this value from the Kube API Server when starting up. If you look at the Kube API Server’s documentation it states that if
--advertise-addressflag isn’t set it will default to--bind-addressand if--bind-addressisn’t set it will default to host’s default interface. Which in my case and yours seems to be the NAT interface, which isn’t what we want. So what I did was set the Kube API Server’s--advertise-addressflag and everything started working. So right after Step 2 and before Step 3 ofYou will need to update your
/etc/kubernetes/manifests/kube-apiserver.jsonand add the--advertise-addressflag to point to your master node’s IP address.So for example: My master node’s IP address is
172.28.128.2, which means right after Step 2 I do:I am not to sure if this is a valid long term solution, because if the default kube-apiserver.json changes then those changes wouldn’t get reflected by doing what I am doing. Ideally, I think the user would want some way to set these flags via kubeadm or maybe the user should be responsible for parsing the JSON themselves. Thoughts?
However, it still may be a good idea to update Step 2 of Installing Kubernetes on Linux with kubeadm to atleast mention to the users that they can update the kube component flags by modifying the their json found at:
/etc/kubernetes/manifests/.I have the same issue. I’m using VirtualBox to run 2 VM based on minimal Centos 7 image. All VMs are attached to 2 interfaces, a NAT and an host-only network. The two VMs are able to connect to each other using the host-only network interfaces.
I tried also with the instructions about Calico and Canal, and I cannot make them work either.
Same problem, solved it with https://stackoverflow.com/questions/39872332/how-to-fix-weave-net-crashloopbackoff-for-the-second-node
resolved my issue via adding a routing rule to use eth1 for node machines to kubernetes service range ip. Example:
@avkonst see https://github.com/kubernetes/kubernetes/pull/34607.
Also, you can do this for now:
I encountered this issue too.
Adding --advertise-address when starting kube-apiserver solved this issue.