weave: Segmentation fault on ARM with version 2.5.2

I installed weave on a Raspberry PI 3 master node by issuing these commands:

`` $ sudo kubeadm init

$ kubectl apply -f
https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)” ``

This resulted in a problem with one of the weave pods:

olavt@k8s-master-1:~ $ kubectl get pods --namespace=kube-system NAME READY STATUS RESTARTS AGE coredns-5644d7b6d9-8stg2 0/1 Pending 0 18m coredns-5644d7b6d9-kcv2h 0/1 Pending 0 18m etcd-k8s-master-1 1/1 Running 0 18m kube-apiserver-k8s-master-1 1/1 Running 0 18m kube-controller-manager-k8s-master-1 1/1 Running 1 18m kube-proxy-n9whx 1/1 Running 0 18m kube-scheduler-k8s-master-1 1/1 Running 1 18m weave-net-95vdr 1/2 CrashLoopBackOff 8 18m

olavt@k8s-master-1:~ $ kubectl logs weave-net-95vdr -c weave --namespace=kube-system Segmentation fault (core dumped)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 10
  • Comments: 43 (4 by maintainers)

Most upvoted comments

I ran “sudo rpi-update” and restarted, after that the “weave-net” pod status changed to Running. I guess it fixed the problem.

same here on rpi3 & 4

Not sure if it is related, but although weave-net pod is running after rpi-update, I still has issues with routing. Currently I’m trying to install metallb and its controller pod has some errors when trying to connect to kubernetes API.

I think I found the reason. My Raspberry Pi uses iptables 1.8.2 but “Weave Net does not work on hosts running iptables 1.8 or above” (see here) also this weave issue

$ iptables --version
iptables v1.8.2 (nf_tables)

So I ran below commands from Kubernetes doc

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

And now it works

Like others, I started having this problem too. I have a perfectly working four node cluster (1 master, 3 workers), all based on Raspbian Buster running on RPi4B w/ 4GB. It appeared to start happening on one of the nodes, after ‘apt upgrade && apt update’ one day recently. So, I deleted the broken node from the cluster, and then re-imaged the SD card with the base-image I had initially used to create each node from (circa Aug 15 2019). After re-joining the node back into the kube cluster, the weave docker images were re-pulled (as a result of the daemonset placing a weave pod on the node once again). And all is working once again. So, I think it has more to do with some change introduced in recent Raspbian Buster udpates. For me, systems running these kernel versions DO WORK (uname -r); 4.19.57-v7l+ 4.19.66-v7l+ A system running this kernel version (or later) DOES NOT work; 4.19.75-v7l+

For me, it looks like the SEGV is happening very soon after the prog/weave-kube/launch.sh script starts running. I stuck a ‘set -x’ near the beginning of that script to see how far it gets. The SEGV happens on the call to /home/weave/weaver at line 103 of the script. This line; PEERNAME=$(/home/weave/weaver $EXTRA_ARGS --print-peer-name --host-root=$HOST_ROOT --db-prefix="$DB_PREFIX")

That appears to be a call to a bit of Go code in prog/weaver/main.go Looking at that code, I’d guess the SEGV is happening during the call to the “peerName” function on line 243, cuz that Go code never returns the Mac address of the ‘weave’ virtual bridge interface like it’s supposed to (probably because something prevented the creation of the bridge in the first place). This bit (lines 243 - 247); name := peerName(routerName, bridgeConfig.WeaveBridgeName, dbPrefix, hostRoot) if justPeerName { fmt.Printf("%s\n", name) os.Exit(0) }

Really reaching now… Looking at the peerName function starting on line 708 of prog/weaver/main.go, I’d further conjecture the problem is happening on the call to net.InterfaceByName(bridgeName) on line 710.

rpi-update also fixed my problem, would be good to know why though.

@hakan458 and other interested parties. For the sake of posterity…

A change was introduced in the upstream Linux kernel, which then found it’s way into the RPi kernel (by way of commit c0ccb4d). That change then made it’s way into the stable apt channel sometime around the time the 4.19.75-xxx RPi kernel was released. That change is ultimately what causes the SEGV.

The good news is, that change has subsequently been reverted (in commit 68a2665). The bad news is, that commit has not yet made it’s way into the stable apt channel. To-date, it is only available in recent kernel builds that are available via an ‘rpi-update’. Until that revert is delivered in a kernel via the stable apt channel, the only way to get it is to do an ‘rpi-update’. This is generally considered a risky thing to do because there may be other breaking changes introduced in the bleeding edge kernel delivered via ‘rpi-update’.

I posed the question to the kernel maintainers (here) to see when we might expect commit 68a2665 to find it’s way into the stable apt channel. Unfortunately, it appears there are other dependencies holding things up at the moment. Hopefully, we’ll see the log-jam free up soon.

@christianreddington the latest kernel version is not 75 (as that one has a bug as you can see up) so make sure you get 81.

This is the result

$sudo dpkg --list | grep raspberrypi-kernel; uname -a

ii  raspberrypi-kernel                    1.20190925+1-1                      armhf        Raspberry Pi bootloader
Linux homepi 4.19.80-v7+ #1275 SMP Mon Oct 28 18:27:03 GMT 2019 armv7l GNU/Linux