weave: Memory leak/OOM with "Received update for IP range I own" messages in log

What you expected to happen?

Memory usage of the weave process is expected to be stable and not grow unbounded over time.

What happened?

I had a stable 2.5.0 weave network in my Kubernetes 1.9 cluster of about 100 nodes. The weave was initially installed by kops and had a memory limit of 200mb set. There were no occurrences of “Received update for IP range I own” in the log files and memory usage for weave pods in the cluster had been very stable over time for weeks.

As part of refactoring some services, about 30 nodes were removed from the cluster (bringing the cluster size down to 71 nodes). After this action, the memory usage of the weave pods started growing until it exceeded the memory limit, at which time the pod was OOM killed and restarted. These restarts result in brief disruption for the node on which the restart occurs. At this time the “Received update for IP range I own” message started appearing in the logs (although not from all pods, this nuance was not discovered until later).

After looking at some related tickets and such here (#3650, #3600, #2797), the following actions were taken:

The “status ipam” output was checked and seen to have a lot of “unreachable” peers listed in it
The unreachable nodes listed by “status ipam” were removed with rmpeer on one node, though this did not fix all the unreachables on all the nodes, the process of listing and removing unreachables was done on a couple of other systems before all systems were showing all 71 nodes in the list and all as reachable.
updated to 2.5.2 as there were some related looking tickets mentioned in that release
increased the memory limit so that OOM killing might happen less frequently (from 200mb to 1gb)

Weave pods continue to grow in memory usage, the new 2.5.2 pods have not hit their 1g limit yet but look to be heading that way. The “update for IP range I own” messages are still being seen - however on closer inspection these messages are only coming from 3 of the 71 pods.

How to reproduce it?

Have a working kubernetes cluster and delete some nodes out of it.

Anything else we need to know?

Versions:


        Version: 2.5.2 (up to date; next check at 2019/07/12 18:43:12)

        Service: router
       Protocol: weave 1..2
           Name: ea:38:6f:58:7b:81(ip-10-32-124-236.us-west-2.compute.internal)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 71
    Connections: 71 (70 established, 1 failed)
          Peers: 71 (with 4966 established, 4 pending connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 100.96.0.0/11
  DefaultSubnet: 100.96.0.0/11

admin@ip-10-32-92-49:~$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

Linux ip-10-32-92-49 4.4.121-k8s #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 GNU/Linux

Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}```

Logs:

This is the logs from one of the weave pods that is showing the “Received update for IP range I own” messages: weave-net-q56hl.log This is the pprof/heap output for the above node weave-net-q56hl.heap.gz This is status ipam from the above node weave-net-q56hl.ipam.txt This is status peers from the above node weave-net-q56hl.peers.txt

This is the logs from one of the weave pods not showing that message: weave-net-9t7d8.log This is the pprof/heap output for the above node weave-net-9t7d8.heap.gz

And here’s a picture showing the history of memory usage form these pods Screen Shot 2019-07-12 at 9 20 16 AM

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 16 (4 by maintainers)

Most upvoted comments

OK so since the weave nodes became consistent, things have been stable with no memory growth or OOM issues. I have done a minor amount of scaling (perhaps up/down of 10 nodes or so) and things have remained consistent throughout that with 2.5.2.

I wrote a quick script verify-weave.sh which will go through all weave pods in the cluster, compute a checksum on the ‘status ipam’ list and tell you if there are any weave pods that disagree on their peer list. We used this to identify the different groups of pods in the cluster and decide which we wanted to preserve and which ones we wanted to reset/restart. Another quick script bump-weave.sh was then used to remove the db file and restart for those weave pods we wished to reset.

Both of those scripts were written quickly to address a particular condition here and so are not intended as portable or good examples of coding, but may be useful to someone so here they are.

@itskingori , @murali-reddy - thank you both for your help on this.

+11

sferrett on Jul 24, 2019

@itskingori - I will keep an eye on it for sure. We’re not doing a huge amount of scaling at the moment, however we did just do some refactoring of instance types hence we had a fairly large amount of hosts created and deleted which is what set the current condition into motion. There is some more similar adjustments still outstanding so I will check and let you know how things look as that happens.

Also, thanks very much for your feedback and detail - sounds like your process to recover normal operation was very similar to mine. The only difference seems to be that you rebooted/terminated the nodes whereas I just recycled the weave pod on the node and left the other pods and the node itself alone. We’ll be doing this same process in our prd cluster today so if I cook up a noteworthy script I will share it here. Cheers -

sferrett on Jul 16, 2019

Yes, there are no ramifications. For now these manual steps will reconcile from state where there were IPAM conflicts. In 2.6 release IPAM conflicts are automatically resolved (https://github.com/weaveworks/weave/pull/3637)

murali-reddy on Jul 16, 2019

@itskingori - interesting. I ran the following on my prd cluster, which is having this symptom, and I also see there are two distinct set of “status ipam” hosts returned from the cluster, each comprising about 50% of the nodes:
# Generate a sorted list of hosts-only from ipam output per weave node
$ for s in `kubectl get pod | grep weave-net | awk '{ print $1 }'`; do kubectl exec -ti $s -c weave -- ./weave --local status ipam | awk '{ print $1 }' | sort > $s.ipam; done

# See how many different variants of that list there are - this shows there are two different checksums for the files, and there are 38 of one and 33 of the other.
$ sum *.ipam | awk '{ print $1 }' | sort | uniq -c
     38 02398
     33 57603

Yes, this is more or less the position we were in. I’ll borrow your commands are they seem similar to (if not better than) what I have to look through weave 👇

#!/bin/bash

set -u

context=$1
command=$2
pods=$(kubectl get pods -n=kube-system -lname=weave-net --context=${context} | tail -n+2 | awk '{print $1}')

for pod in ${pods[@]}
do
  kubectl exec -n=kube-system --context=${environment} ${pod} -- /home/weave/weave --local ${command}
done

I use something like ./script.sh <cluster> "<status ipam>" to loop through all the weave pods.

I wonder if I can do something with that one host that’s in one list but not the other? Or if I do indeed need to restart all the systems in one of the lists…

Weave works by sharing state by consensus, the fact that there are two states breaks weave and you need to bite the bullet and get rid of a group or any weave pods that have inconsistent state. There are two ways to do this:

Terminate all the nodes with bad state at the same time.
Terminate weave pods with bad state while deleting their database (that stays on the host) at the same time.

I ran with no. 2 because this was production so recovery was critical. I didn’t have time to write a script for no. 2. I pretty much figured out which group to terminate and terminated them all at the same time. I’m guessing, it’s better to terminate weave while deleting the database on the host so that the new pod starts from a clean slate and gets its state from other correct weave pods.

When you say you terminated the hosts in one of the groups, did you do anything to the remaining systems or just terminate the nodes in one of the lists?

I didn’t do anything to the remaining group as their state was similar and correct. Once you get rid of the bad weave pods, everything goes back to normal and the state is now consistent among those remaining … the problem goes away, and the cluster heals.

Also did the nodes that were terminated have anything special done on them prior (such as running weave reset or similar?) .

I didn’t do anything to the ones remaining. The fact that they had similar state was all I needed.

I’m also curious if you did them one at a time, and somehow observe that they came up with a consistent ipam list?

I terminated them all at the same time because I could not risk them sharing state any longer. I figured doing it one by one might not work because new pod might get it state from another bad one. I wanted all the ‘bad’ ones gone and only the ‘good’ ones left to share state.

Basically I think I need to do the same thing here, just want to make sure that deleting those nodes won’t just increase the amount of breakage as it seems like node removal is what precipitated all of this in the first place.

It didn’t cause more breakage for me. Other than the time when I had to wait for the autoscaling group to replace the nodes that I just terminated. And the disruption it caused to the apps on the nodes I just terminated as new pods come up.

itskingori on Jul 16, 2019

Here are the log files from the other two weave pods producing the “Received update for IP range I own” log messages - only 3 of the 71 pods are producing this message (the other is in the original comment)

weave-net-67hrw.log weave-net-cqfc9.log

sferrett on Jul 12, 2019