weave: Weave not working correctly leads to containers stuck in ContainerCreating
What you expected to happen?
Weave should not have memory of previous removed nodes as this can cause ax exhaustion of IPs.
What happened?
Some containers of the cluster were in status ContainerCreating
and could never transion to a Running
status. We could see by describing one of the pods that they were reporting Failed create pod sandbox
. Here is a list of similar issues which could still be unrelated:
- https://github.com/kubernetes/kubeadm/issues/578 (we don’t use kubeadm, but it is related)
- https://github.com/kubernetes/kops/issues/3575
- https://github.com/weaveworks/weave/issues/3310
- https://github.com/weaveworks/weave/issues/3300
In our cluster we scale down the nodes every night to save money, by changing the size of the Autoscaling Group in AWS (it’s a kops cluster).
We saw the following in weave containers:
for i in $(kubectl get pods -n kube-system | grep weave | awk '{ print $1}'); do kubectl get pods $i -o wide -n kube-system; kubectl exec -n kube-system $i -c weave -- /home/weave/weave --local status connections; done
<- 11.10.53.254:59238 established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
<- 11.10.125.51:52928 established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
<- 11.10.95.88:60391 established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783 failed cannot connect to ourself, retry: never
<- 11.10.53.254:33762 established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783 established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
<- 11.10.95.88:58856 established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.125.51:6783 failed cannot connect to ourself, retry: never
-> 11.10.51.247:6783 established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
-> 11.10.125.51:6783 established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
-> 11.10.95.88:6783 established fastdp e2:a6:ae:06:8f:d1(ip-10-11-95-88.eu-west-1.compute.internal) mtu=8912
-> 11.10.53.254:6783 failed cannot connect to ourself, retry: never
-> 11.10.125.51:6783 established fastdp aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) mtu=8912
-> 11.10.51.247:6783 established fastdp 6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) mtu=8912
<- 11.10.53.254:55665 established fastdp 9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) mtu=8912
-> 11.10.95.88:6783 failed cannot connect to ourself, retry: never
This is NOT significant. It’s fine that the nodes say that they can’t connect to ourself or at least we see this error in the status connection
command of the weave CLI even on a working cluster.
What is more interesting is the output of status ipam
:
kubectl exec -n kube-system weave-net-dcrj2 -c weave -- /home/weave/weave --local status ipam
9e:51:84:a9:2b:99(ip-10-11-53-254.eu-west-1.compute.internal) 7 IPs (00.0% of total) (7 active)
ba:3e:73:6a:13:c7() 256 IPs (00.0% of total) - unreachable!
a2:6a:83:e2:d2:7e() 32 IPs (00.0% of total) - unreachable!
32:a6:83:f6:c0:25() 1024 IPs (00.0% of total) - unreachable!
8e:19:b3:42:4a:ec() 2048 IPs (00.1% of total) - unreachable!
ba:bb:8e:64:d8:07() 4096 IPs (00.2% of total) - unreachable!
da:f0:0a:b5:31:58() 524288 IPs (25.0% of total) - unreachable!
ae:fc:8e:74:74:53() 2048 IPs (00.1% of total) - unreachable!
3e:a3:6c:2c:68:9c() 16 IPs (00.0% of total) - unreachable!
3e:bc:b5:42:15:66() 32 IPs (00.0% of total) - unreachable!
26:87:a6:1f:4c:82() 8192 IPs (00.4% of total) - unreachable!
82:cf:4e:23:3f:73() 4096 IPs (00.2% of total) - unreachable!
ba:82:f4:d0:10:c5() 32768 IPs (01.6% of total) - unreachable!
62:06:bf:fa:c8:b2() 4096 IPs (00.2% of total) - unreachable!
5e:fd:cf:58:ce:01() 256 IPs (00.0% of total) - unreachable!
5a:f7:3b:61:39:61() 32 IPs (00.0% of total) - unreachable!
36:b5:90:80:65:88() 512 IPs (00.0% of total) - unreachable!
36:91:10:1e:29:de() 1024 IPs (00.0% of total) - unreachable!
62:0b:d4:f8:e1:51() 4096 IPs (00.2% of total) - unreachable!
9a:7c:fa:51:3b:a9() 192 IPs (00.0% of total) - unreachable!
da:c7:bd:46:98:c7() 1024 IPs (00.0% of total) - unreachable!
e6:cf:6c:3e:fb:b0() 2048 IPs (00.1% of total) - unreachable!
42:81:30:9e:df:0a() 128 IPs (00.0% of total) - unreachable!
fe:77:8f:46:67:f4() 1024 IPs (00.0% of total) - unreachable!
0e:85:43:e7:98:c2() 512 IPs (00.0% of total) - unreachable!
3a:83:86:eb:df:da() 128 IPs (00.0% of total) - unreachable!
16:41:a0:af:8c:3e() 128 IPs (00.0% of total) - unreachable!
3e:8c:be:be:a7:0c() 16 IPs (00.0% of total) - unreachable!
fa:88:5f:ea:c5:5f() 65536 IPs (03.1% of total) - unreachable!
9a:ba:ce:4d:60:bd() 1024 IPs (00.0% of total) - unreachable!
d6:ad:e3:03:aa:42() 32 IPs (00.0% of total) - unreachable!
56:db:68:38:9b:5b() 32 IPs (00.0% of total) - unreachable!
3a:0c:3c:e9:59:d8() 128 IPs (00.0% of total) - unreachable!
b6:76:96:73:bc:6b() 2048 IPs (00.1% of total) - unreachable!
1e:e8:8e:ad:fd:a9() 262144 IPs (12.5% of total) - unreachable!
8a:f0:9a:e1:c7:29() 32 IPs (00.0% of total) - unreachable!
e2:27:36:19:4e:c1() 32768 IPs (01.6% of total) - unreachable!
0e:bf:ce:ac:ea:dd() 256 IPs (00.0% of total) - unreachable!
8a:00:d6:3d:67:39() 256 IPs (00.0% of total) - unreachable!
ae:03:57:54:c1:ec() 2048 IPs (00.1% of total) - unreachable!
1a:0d:d2:ff:88:3b() 32768 IPs (01.6% of total) - unreachable!
06:68:b6:87:48:75() 64 IPs (00.0% of total) - unreachable!
9e:f4:4f:b3:77:07() 8192 IPs (00.4% of total) - unreachable!
22:85:55:e9:07:e3() 64 IPs (00.0% of total) - unreachable!
a6:cc:48:0b:42:8a() 128 IPs (00.0% of total) - unreachable!
fa:2e:36:62:23:d9() 1024 IPs (00.0% of total) - unreachable!
ae:c8:70:e0:23:22() 49152 IPs (02.3% of total) - unreachable!
be:66:9a:85:fa:df() 16 IPs (00.0% of total) - unreachable!
46:cb:ba:1c:b4:3a() 16 IPs (00.0% of total) - unreachable!
fa:00:d3:e8:a4:f1() 32768 IPs (01.6% of total) - unreachable!
8e:d7:cf:ff:97:69() 16384 IPs (00.8% of total) - unreachable!
aa:52:36:e7:8d:d3(ip-10-11-125-51.eu-west-1.compute.internal) 19 IPs (00.0% of total)
fe:05:22:50:04:0a() 2048 IPs (00.1% of total) - unreachable!
3e:91:da:4d:a9:ec() 262144 IPs (12.5% of total) - unreachable!
82:a3:c7:f9:6d:e9() 128 IPs (00.0% of total) - unreachable!
2e:8b:a6:cc:a7:19() 32 IPs (00.0% of total) - unreachable!
2e:f7:59:91:b2:11() 4096 IPs (00.2% of total) - unreachable!
c6:18:a6:97:97:4c() 32768 IPs (01.6% of total) - unreachable!
56:ab:99:e9:91:fd() 16384 IPs (00.8% of total) - unreachable!
7a:6d:41:17:b0:c3() 20 IPs (00.0% of total) - unreachable!
c2:7f:f3:07:bf:48() 2048 IPs (00.1% of total) - unreachable!
82:83:52:4f:34:f8() 524288 IPs (25.0% of total) - unreachable!
6a:a4:ca:68:f4:02(ip-10-11-51-247.eu-west-1.compute.internal) 2 IPs (00.0% of total)
6a:09:6a:72:65:31() 64 IPs (00.0% of total) - unreachable!
3a:fe:7d:61:b6:12() 32 IPs (00.0% of total) - unreachable!
9e:93:78:0d:95:6f() 512 IPs (00.0% of total) - unreachable!
a2:3e:3e:c8:40:34() 16 IPs (00.0% of total) - unreachable!
82:68:49:b6:38:28() 4096 IPs (00.2% of total) - unreachable!
c2:78:2d:27:b1:4d() 16384 IPs (00.8% of total) - unreachable!
76:5f:e2:06:fa:35() 131072 IPs (06.2% of total) - unreachable!
This seems to be telling us that most of the cluster is unreachable… which is making the CNI not work and containers can’t start cause they can’t get an IP address. We verified that this was the case by reading the kubelet logs:
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891765 7383 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891815 7383 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891832 7383 kuberuntime_manager.go:647] createPodSandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Aug 23 07:46:29 ip-10-11-53-254 kubelet[7383]: E0823 07:46:29.891888 7383 pod_workers.go:186] Error syncing pod f6fe3f93-a6a6-11e8-80a5-0205d2a81076 ("nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)"), skipping: failed to "CreatePodSandbox" for "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Aug 23 07:46:30 ip-10-11-53-254 kubelet[7383]: I0823 07:46:30.730025 7383 kuberuntime_manager.go:416] Sandbox for pod "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)" has no IP address. Need to start a new one
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: I0823 07:46:31.436352 7383 kubelet.go:1896] SyncLoop (PLEG): "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)", event: &pleg.PodLifecycleEvent{ID:"f6fe3f93-a6a6-11e8-80a5-0205d2a81076", Type:"ContainerDied", Data:"da883b31b03187408bbee1b4642ba836932776977c200905fcb8e5f8cb9f4024"}
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: W0823 07:46:31.436438 7383 pod_container_deletor.go:77] Container "da883b31b03187408bbee1b4642ba836932776977c200905fcb8e5f8cb9f4024" not found in pod's containers
Aug 23 07:46:31 ip-10-11-53-254 kubelet[7383]: I0823 07:46:31.436465 7383 kubelet.go:1896] SyncLoop (PLEG): "nginx-7dc755b6f7-kc5g8_custom(f6fe3f93-a6a6-11e8-80a5-0205d2a81076)", event: &pleg.PodLifecycleEvent{ID:"f6fe3f93-a6a6-11e8-80a5-0205d2a81076", Type:"ContainerStarted", Data:"4deab2663ce209335c30401f003c0465401ef20604d32e2cfbd5ec6ab9b6b938"}
Aug 23 07:47:05 ip-10-11-53-254 kubelet[7383]: I0823 07:47:05.109777 7383 server.go:796] GET /stats/summary/: (3.458746ms) 200 [[Go-http-client/1.1] 11.10.125.51:38646]
Aug 23 07:48:05 ip-10-11-53-254 kubelet[7383]: I0823 07:48:05.027382 7383 server.go:796] GET /stats/summary/: (3.582405ms) 200 [[Go-http-client/1.1] 11.10.125.51:38646]
Aug 23 07:48:26 ip-10-11-53-254 kubelet[7383]: I0823 07:48:26.863628 7383 container_manager_linux.go:425] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
In the logs above you can see has no IP address. Need to start a new one
.
We believe that this is due to the fact that we continuously shut down the nodes of our cluster in the night by simply scaling the ASG to 0 and back to the original size in the morning. It looks like that kops/weave do not do any automatic cleanup, probably cause they don’t have a chance.
From the weave
documentation, it seems that we have to do something when the nodes exits, like mentioned in the official documentation. We still have to find a proper way to remove nodes from the Kubernetes cluster.
We did the reset by doing the following:
- ssh into the EC2 instances (masters and workers) one by one and delete the file
/var/lib/weave/weave-netdata.db
. There is no need for a backup of that file - restart all the weave pods by deleting them, i.e.:
for i in $(kubectl get pods -n kube-system | awk '{print $1}' | grep weave); do kubectl delete pod -n kube-system $i; done
This brought us back to a healthy state, that we could figure by running again the status ipam
weave command:
k exec -it weave-net-47lhb -n kube-system -c weave /bin/sh
/home/weave # ./weave --local status ipam
9e:51:84:a9:2b:99(ip-172-20-53-254.eu-central-1.compute.internal) 524289 IPs (25.0% of total) (8 active)
6a:a4:ca:68:f4:02(ip-172-20-51-247.eu-central-1.compute.internal) 786411 IPs (37.5% of total)
aa:52:36:e7:8d:d3(ip-172-20-125-51.eu-central-1.compute.internal) 524307 IPs (25.0% of total)
e2:a6:ae:06:8f:d1(ip-172-20-95-88.eu-central-1.compute.internal) 262145 IPs (12.5% of total)
How to reproduce it?
Not sure, probably deleting lots of nodes from the cluster in a continuous way.
Anything else we need to know?
Versions:
$ weave version: 2.3.0
$ docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Experimental: false
$ uname -a
Linux ip-172-20-95-88 4.4.0-1054-aws #63-Ubuntu SMP Wed Mar 28 19:42:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:13:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Logs:
I don’t have other logs to paste for the moment.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 59 (25 by maintainers)
Hi,
We have a very high churn of nodes( a ~75 nodes cluster churning through about 1.000 nodes a day) and after about 2 weeks of running weave 2.5.0 on a kops-deployed 1.10 cluster we got this issue happening again.
We can’t share useful logs due to the huge timeframe and the node churn. If you have any idea about how we could share relevant information please let me know.
We basically got @Raffo’s commands, put them in a script and have this script run every 3 hours. This solved the issue and we had no more incidents since December.
The relevant part of the script, if anyone needs it:
It might be as silly as enabling weave-net ports 6783/tcp, 6783/udp, 6784/udp on master node(s) in your firewall
@bboreham I have created this issue and re-posted relevant stuff for our issue
Please let me know if you need more information to make progress. From a technical viewpoint, I believe it is nearly certain that my new issue is in fact the exact same as this one ( which is why i commented on it). They both have the same exact root cause and scenario: AWS nodes terminating and then coming back as a part of an ASG.
How is the peer ID calculated? if the answer is the host name (or something that is unique within a cluster only by host name), then in fact on AWS the ip address does matter, because the only difference between node host names is the ip address, because the host name is basically an ip address:
ip-172-25-19-155.ec2.internal
That said, I’m not sure it matters all that much. These arguments are only relevant because of guesses that i’ve made. Feel free to discard my comments and suppositions except for this one:
weave 2.4.0 does not clean up peers correctly, and it is repeatable, producable behavior when you terminate nodes within the same ASG