weave: Weave not working correctly leads to containers stuck in ContainerCreating

What you expected to happen?

Weave should not have memory of previous removed nodes as this can cause ax exhaustion of IPs.

What happened?

Some containers of the cluster were in status ContainerCreating and could never transion to a Running status. We could see by describing one of the pods that they were reporting Failed create pod sandbox. Here is a list of similar issues which could still be unrelated:

https://github.com/kubernetes/kubeadm/issues/578 (we don’t use kubeadm, but it is related)
https://github.com/kubernetes/kops/issues/3575
https://github.com/weaveworks/weave/issues/3310
https://github.com/weaveworks/weave/issues/3300

In our cluster we scale down the nodes every night to save money, by changing the size of the Autoscaling Group in AWS (it’s a kops cluster).

We saw the following in weave containers:

for i in $(kubectl get pods -n kube-system | grep weave | awk '{ print $1}'); do kubectl get pods $i -o wide -n kube-system; kubectl exec -n kube-system $i -c weave -- /home/weave/weave --local status connections; done

<- 11.10.53.254:59238   established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
<- 11.10.125.51:52928   established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
<- 11.10.95.88:60391    established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783    failed      cannot connect to ourself, retry: never
<- 11.10.53.254:33762   established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783    established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
<- 11.10.95.88:58856    established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.125.51:6783    failed      cannot connect to ourself, retry: never
-> 11.10.51.247:6783    established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
-> 11.10.125.51:6783    established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
-> 11.10.95.88:6783     established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.53.254:6783    failed      cannot connect to ourself, retry: never
-> 11.10.125.51:6783    established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783    established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
<- 11.10.53.254:55665   established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
-> 11.10.95.88:6783     failed      cannot connect to ourself, retry: never

This is NOT significant. It’s fine that the nodes say that they can’t connect to ourself or at least we see this error in the status connection command of the weave CLI even on a working cluster.

What is more interesting is the output of status ipam:

kubectl exec -n kube-system weave-net-dcrj2 -c weave -- /home/weave/weave --local status ipam
9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal)        7 IPs (00.0% of total) (7 active)
ba:3e:73:6a:13:c7()                        256 IPs (00.0% of total) - unreachable!
a2:6a:83:e2:d2:7e()                         32 IPs (00.0% of total) - unreachable!
32:a6:83:f6:c0:25()                       1024 IPs (00.0% of total) - unreachable!
8e:19:b3:42:4a:ec()                       2048 IPs (00.1% of total) - unreachable!
ba:bb:8e:64:d8:07()                       4096 IPs (00.2% of total) - unreachable!
da:f0:0a:b5:31:58()                     524288 IPs (25.0% of total) - unreachable!
ae:fc:8e:74:74:53()                       2048 IPs (00.1% of total) - unreachable!
3e:a3:6c:2c:68:9c()                         16 IPs (00.0% of total) - unreachable!
3e:bc:b5:42:15:66()                         32 IPs (00.0% of total) - unreachable!
26:87:a6:1f:4c:82()                       8192 IPs (00.4% of total) - unreachable!
82:cf:4e:23:3f:73()                       4096 IPs (00.2% of total) - unreachable!
ba:82:f4:d0:10:c5()                      32768 IPs (01.6% of total) - unreachable!
62:06:bf:fa:c8:b2()                       4096 IPs (00.2% of total) - unreachable!
5e:fd:cf:58:ce:01()                        256 IPs (00.0% of total) - unreachable!
5a:f7:3b:61:39:61()                         32 IPs (00.0% of total) - unreachable!
36:b5:90:80:65:88()                        512 IPs (00.0% of total) - unreachable!
36:91:10:1e:29:de()                       1024 IPs (00.0% of total) - unreachable!
62:0b:d4:f8:e1:51()                       4096 IPs (00.2% of total) - unreachable!
9a:7c:fa:51:3b:a9()                        192 IPs (00.0% of total) - unreachable!
da:c7:bd:46:98:c7()                       1024 IPs (00.0% of total) - unreachable!
e6:cf:6c:3e:fb:b0()                       2048 IPs (00.1% of total) - unreachable!
42:81:30:9e:df:0a()                        128 IPs (00.0% of total) - unreachable!
fe:77:8f:46:67:f4()                       1024 IPs (00.0% of total) - unreachable!
0e:85:43:e7:98:c2()                        512 IPs (00.0% of total) - unreachable!
3a:83:86:eb:df:da()                        128 IPs (00.0% of total) - unreachable!
16:41:a0:af:8c:3e()                        128 IPs (00.0% of total) - unreachable!
3e:8c:be:be:a7:0c()                         16 IPs (00.0% of total) - unreachable!
fa:88:5f:ea:c5:5f()                      65536 IPs (03.1% of total) - unreachable!
9a:ba:ce:4d:60:bd()                       1024 IPs (00.0% of total) - unreachable!
d6:ad:e3:03:aa:42()                         32 IPs (00.0% of total) - unreachable!
56:db:68:38:9b:5b()                         32 IPs (00.0% of total) - unreachable!
3a:0c:3c:e9:59:d8()                        128 IPs (00.0% of total) - unreachable!
b6:76:96:73:bc:6b()                       2048 IPs (00.1% of total) - unreachable!
1e:e8:8e:ad:fd:a9()                     262144 IPs (12.5% of total) - unreachable!
8a:f0:9a:e1:c7:29()                         32 IPs (00.0% of total) - unreachable!
e2:27:36:19:4e:c1()                      32768 IPs (01.6% of total) - unreachable!
0e:bf:ce:ac:ea:dd()                        256 IPs (00.0% of total) - unreachable!
8a:00:d6:3d:67:39()                        256 IPs (00.0% of total) - unreachable!
ae:03:57:54:c1:ec()                       2048 IPs (00.1% of total) - unreachable!
1a:0d:d2:ff:88:3b()                      32768 IPs (01.6% of total) - unreachable!
06:68:b6:87:48:75()                         64 IPs (00.0% of total) - unreachable!
9e:f4:4f:b3:77:07()                       8192 IPs (00.4% of total) - unreachable!
22:85:55:e9:07:e3()                         64 IPs (00.0% of total) - unreachable!
a6:cc:48:0b:42:8a()                        128 IPs (00.0% of total) - unreachable!
fa:2e:36:62:23:d9()                       1024 IPs (00.0% of total) - unreachable!
ae:c8:70:e0:23:22()                      49152 IPs (02.3% of total) - unreachable!
be:66:9a:85:fa:df()                         16 IPs (00.0% of total) - unreachable!
46:cb:ba:1c:b4:3a()                         16 IPs (00.0% of total) - unreachable!
fa:00:d3:e8:a4:f1()                      32768 IPs (01.6% of total) - unreachable!
8e:d7:cf:ff:97:69()                      16384 IPs (00.8% of total) - unreachable!
aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal)       19 IPs (00.0% of total)
fe:05:22:50:04:0a()                       2048 IPs (00.1% of total) - unreachable!
3e:91:da:4d:a9:ec()                     262144 IPs (12.5% of total) - unreachable!
82:a3:c7:f9:6d:e9()                        128 IPs (00.0% of total) - unreachable!
2e:8b:a6:cc:a7:19()                         32 IPs (00.0% of total) - unreachable!
2e:f7:59:91:b2:11()                       4096 IPs (00.2% of total) - unreachable!
c6:18:a6:97:97:4c()                      32768 IPs (01.6% of total) - unreachable!
56:ab:99:e9:91:fd()                      16384 IPs (00.8% of total) - unreachable!
7a:6d:41:17:b0:c3()                         20 IPs (00.0% of total) - unreachable!
c2:7f:f3:07:bf:48()                       2048 IPs (00.1% of total) - unreachable!
82:83:52:4f:34:f8()                     524288 IPs (25.0% of total) - unreachable!
6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal)        2 IPs (00.0% of total)
6a:09:6a:72:65:31()                         64 IPs (00.0% of total) - unreachable!
3a:fe:7d:61:b6:12()                         32 IPs (00.0% of total) - unreachable!
9e:93:78:0d:95:6f()                        512 IPs (00.0% of total) - unreachable!
a2:3e:3e:c8:40:34()                         16 IPs (00.0% of total) - unreachable!
82:68:49:b6:38:28()                       4096 IPs (00.2% of total) - unreachable!
c2:78:2d:27:b1:4d()                      16384 IPs (00.8% of total) - unreachable!
76:5f:e2:06:fa:35()                     131072 IPs (06.2% of total) - unreachable!

This seems to be telling us that most of the cluster is unreachable… which is making the CNI not work and containers can’t start cause they can’t get an IP address. We verified that this was the case by reading the kubelet logs:

Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891765    7383 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891815    7383 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891832    7383 kuberuntime_manager.go:647] createPodSandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891888    7383 pod_workers.go:186] Error syncing pod f6fe3f93-a6a6-11e8-80a5-0205d2a81076 ("nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)"), skipping: failed to "CreatePodSandbox" for "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Aug 23 07:46:30 ip-10-11-53-254 kubelet[7383]: I0823 07:46:30.730025    7383 kuberuntime_manager.go:416] Sandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" has no IP address.  Need to start a new one
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: I0823 07:46:31.436352    7383 kubelet.go:1896] SyncLoop (PLEG): "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)", event: &pleg.PodLifecycleEvent{ID:"f6fe3f93-a6a6-11e8-80a5-0205d2a81076", Type:"ContainerDied", Data:"da883b31b03187408bbee1b4642ba836932776977c200905fcb8e5f8cb9f4024"}
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: W0823 07:46:31.436438    7383 pod_container_deletor.go:77] Container "da883b31b03187408bbee1b4642ba836932776977c200905fcb8e5f8cb9f4024" not found in pod's containers
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: I0823 07:46:31.436465    7383 kubelet.go:1896] SyncLoop (PLEG): "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)", event: &pleg.PodLifecycleEvent{ID:"f6fe3f93-a6a6-11e8-80a5-0205d2a81076", Type:"ContainerStarted", Data:"4deab2663ce209335c30401f003c0465401ef20604d32e2cfbd5ec6ab9b6b938"}
Aug 23 07:47:05 ip-10-11-53-254 kubelet[7383]: I0823 07:47:05.109777    7383 server.go:796] GET /stats/summary/: (3.458746ms) 200 [[Go-http-client/1.1] 11.10.125.51:38646]
Aug 23 07:48:05 ip-10-11-53-254 kubelet[7383]: I0823 07:48:05.027382    7383 server.go:796] GET /stats/summary/: (3.582405ms) 200 [[Go-http-client/1.1] 11.10.125.51:38646]
Aug 23 07:48:26 ip-10-11-53-254 kubelet[7383]: I0823 07:48:26.863628    7383 container_manager_linux.go:425] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service

In the logs above you can see has no IP address. Need to start a new one.

We believe that this is due to the fact that we continuously shut down the nodes of our cluster in the night by simply scaling the ASG to 0 and back to the original size in the morning. It looks like that kops/weave do not do any automatic cleanup, probably cause they don’t have a chance.

From the weave documentation, it seems that we have to do something when the nodes exits, like mentioned in the official documentation. We still have to find a proper way to remove nodes from the Kubernetes cluster.

We did the reset by doing the following:

ssh into the EC2 instances (masters and workers) one by one and delete the file /var/lib/weave/weave-netdata.db. There is no need for a backup of that file
restart all the weave pods by deleting them, i.e.: for i in $(kubectl get pods -n kube-system | awk '{print $1}' | grep weave); do kubectl delete pod -n kube-system $i; done

This brought us back to a healthy state, that we could figure by running again the status ipam weave command:

k exec -it weave-net-47lhb -n kube-system -c weave /bin/sh
/home/weave # ./weave --local status ipam
9e:51:84:a9:2b:99(ip-172-20-53-254.eu-central-1.compute.internal)   524289 IPs (25.0% of total) (8 active)
6a:a4:ca:68:f4:02(ip-172-20-51-247.eu-central-1.compute.internal)   786411 IPs (37.5% of total)
aa:52:36:e7:8d:d3(ip-172-20-125-51.eu-central-1.compute.internal)   524307 IPs (25.0% of total)
e2:a6:ae:06:8f:d1(ip-172-20-95-88.eu-central-1.compute.internal)   262145 IPs (12.5% of total)

How to reproduce it?

Not sure, probably deleting lots of nodes from the cluster in a continuous way.

Anything else we need to know?

Versions:

$ weave version: 2.3.0
$ docker version

Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

$ uname -a
Linux ip-172-20-95-88 4.4.0-1054-aws #63-Ubuntu SMP Wed Mar 28 19:42:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:13:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

I don’t have other logs to paste for the moment.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 1
Comments: 59 (25 by maintainers)

Commits related to this issue

Merge pull request #3400 from weaveworks/issues/3384-peerlist-race Fix two race conditions in Kubernetes IPAM reclaim Fix #3384 — committed to weaveworks/weave by brb 6 years ago

Most upvoted comments

Hi,

We have a very high churn of nodes( a ~75 nodes cluster churning through about 1.000 nodes a day) and after about 2 weeks of running weave 2.5.0 on a kops-deployed 1.10 cluster we got this issue happening again.

We can’t share useful logs due to the huge timeframe and the node churn. If you have any idea about how we could share relevant information please let me know.

We basically got @Raffo’s commands, put them in a script and have this script run every 3 hours. This solved the issue and we had no more incidents since December.

The relevant part of the script, if anyone needs it:

#!/bin/bash

NODES=$(kubectl get nodes -o template --template='{{range.items}}{{range.status.addresses}}{{if eq .type "InternalIP"}}{{.address}}{{end}}{{end}} {{end}}')

echo Starting NODES cleanup ...
for node in $NODES
do
      #echo $node
      ssh -t -o ConnectTimeout=10 -o StrictHostKeyChecking=no admin@$node "sudo rm /var/lib/weave/weave-netdata.db"
done

echo Starting WEAVE PODS cleanup ...
for weave_pod in $(kubectl get pods -n kube-system | awk '{print $1}' | grep weave)
do
      kubectl delete pod -n kube-system $weave_pod;
done

Vlaaaaaaad on Jan 8, 2019

It might be as silly as enabling weave-net ports 6783/tcp, 6783/udp, 6784/udp on master node(s) in your firewall

ldynia on Mar 28, 2019

@bboreham I have created this issue and re-posted relevant stuff for our issue

Please let me know if you need more information to make progress. From a technical viewpoint, I believe it is nearly certain that my new issue is in fact the exact same as this one ( which is why i commented on it). They both have the same exact root cause and scenario: AWS nodes terminating and then coming back as a part of an ASG.

dcowden on Sep 5, 2018

So, the design has nothing to do with IP addresses.

How is the peer ID calculated? if the answer is the host name (or something that is unique within a cluster only by host name), then in fact on AWS the ip address does matter, because the only difference between node host names is the ip address, because the host name is basically an ip address:

ip-172-25-19-155.ec2.internal

That said, I’m not sure it matters all that much. These arguments are only relevant because of guesses that i’ve made. Feel free to discard my comments and suppositions except for this one:

weave 2.4.0 does not clean up peers correctly, and it is repeatable, producable behavior when you terminate nodes within the same ASG

dcowden on Sep 5, 2018