weave: Weave on one node within the cluster fails to connect to weave on other nodes.

Hi - great tool, thanks for developing it.

** Is this a BUG REPORT? ** yes

What you expected to happen?

I expect weave running on each node to be able to connect to weave running on every other node.

What happened?

I’m running k8s via kops with two instance groups (eg apps and services) and two namespaces (eg dev and stable).

The initial symptom is that some of the applications running within the ‘apps’ cluster cannot connect to one of the services (nl-rmq). Other can. Looking into it, the endpoint listed by kubectl get endpoints is different to the IP address being resolved by those apps. nslookup on any of those apps that are not working shows that kubernetes.default cannot be resolved.

Then looking into weave connections show that one of the apps nodes is having trouble connecting to anything:

root@ip-10-20-52-110:/home/admin# docker exec -it 7ebce22fcde2 ./weave --local status connections
-> 10.20.53.217:6783     failed      Received update for IP range I own at 100.96.0.0 v109: incoming message says owner ba:2d:dc:a6:b6:54 v110, retry: 2017-11-10 02:14:28.200474235 +0000 UTC 
-> 10.20.55.87:6783      failed      Received update for IP range I own at 100.96.0.0 v108: incoming message says owner ba:2d:dc:a6:b6:54 v110, retry: 2017-11-10 02:10:03.511181899 +0000 UTC 
-> 10.20.61.57:6783      failed      Received update for IP range I own at 100.96.0.0 v111: incoming message says owner ba:2d:dc:a6:b6:54 v112, retry: 2017-11-10 02:17:51.439273967 +0000 UTC 
-> 10.20.40.87:6783      failed      Received update for IP range I own at 100.96.0.0 v108: incoming message says owner ba:2d:dc:a6:b6:54 v110, retry: 2017-11-10 02:10:13.61068866 +0000 UTC 
-> 10.20.52.110:6783     failed      cannot connect to ourself, retry: never 
root@ip-10-20-52-110:/home/admin#

Which explains why some apps are working and others aren’t - the ones that are not working must be on the node which is having trouble connecting to the weave network.

While submitting this report I noticed that the logs for weave on the node which is not connecting to anything has “KILLED” on the last line, whereas the other weave logs don’t. So my current theory is that weave has died on a node, but docker hasn’t restarted it effectively. I have noticed that docker sometimes doesn’t restart applications for other projects that I’ve worked on.

FYI, the complete ‘weave report’ is here

root@ip-10-20-52-110:/home/admin# docker exec -it 7ebce22fcde2 ./weave --local report
{
    "Ready": true,
    "Version": "2.0.1",
    "VersionCheck": {
        "Enabled": true,
        "Success": true,
        "NewVersion": "2.0.4",
        "NextCheckAt": "2017-11-10T03:20:12.10036294Z"
    },
    "Router": {
        "Protocol": "weave",
        "ProtocolMinVersion": 1,
        "ProtocolMaxVersion": 2,
        "Encryption": false,
        "PeerDiscovery": true,
        "Name": "72:07:06:06:16:3c",
        "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
        "Port": 6783,
        "Peers": [
            {
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "UID": 1470446513819984768,
                "ShortID": 1945,
                "Version": 4551,
                "Connections": null
            }
        ],
        "UnicastRoutes": [
            {
                "Dest": "72:07:06:06:16:3c",
                "Via": "00:00:00:00:00:00"
            }
        ],
        "BroadcastRoutes": [
            {
                "Source": "72:07:06:06:16:3c",
                "Via": null
            }
        ],
        "Connections": [
            {
                "Address": "10.20.52.110:6783",
                "Outbound": true,
                "State": "failed",
                "Info": "cannot connect to ourself, retry: never",
                "Attrs": null
            },
            {
                "Address": "10.20.53.217:6783",
                "Outbound": true,
                "State": "failed",
                "Info": "Received update for IP range I own at 100.96.0.0 v109: incoming message says owner ba:2d:dc:a6:b6:54 v110, retry: 2017-11-10 02:14:28.200474235 +0000 UTC",
                "Attrs": null
            },
            {
                "Address": "10.20.55.87:6783",
                "Outbound": true,
                "State": "failed",
                "Info": "Inconsistent entries for 100.96.0.0: owned by 72:07:06:06:16:3c but incoming message says ba:2d:dc:a6:b6:54, retry: 2017-11-10 02:18:34.565171551 +0000 UTC",
                "Attrs": null
            },
            {
                "Address": "10.20.61.57:6783",
                "Outbound": true,
                "State": "failed",
                "Info": "Received update for IP range I own at 100.96.0.0 v111: incoming message says owner ba:2d:dc:a6:b6:54 v112, retry: 2017-11-10 02:17:51.439273967 +0000 UTC",
                "Attrs": null
            },
            {
                "Address": "10.20.40.87:6783",
                "Outbound": true,
                "State": "failed",
                "Info": "Inconsistent entries for 100.96.0.0: owned by 72:07:06:06:16:3c but incoming message says ba:2d:dc:a6:b6:54, retry: 2017-11-10 02:16:17.031757342 +0000 UTC",
                "Attrs": null
            }
        ],
        "TerminationCount": 2247,
        "Targets": [
            "10.20.55.87",
            "10.20.61.57",
            "10.20.40.87",
            "10.20.52.110",
            "10.20.53.217"
        ],
        "OverlayDiagnostics": {
            "fastdp": {
                "Vports": [
                    {
                        "ID": 0,
                        "Name": "datapath",
                        "TypeName": "internal"
                    },
                    {
                        "ID": 1,
                        "Name": "vethwe-datapath",
                        "TypeName": "netdev"
                    },
                    {
                        "ID": 2,
                        "Name": "vxlan-6784",
                        "TypeName": "vxlan"
                    }
                ],
                "Flows": [
                    {
                        "FlowKeys": [
                            "UnknownFlowKey{type: 22, key: 00000000, mask: 00000000}",
                            "UnknownFlowKey{type: 23, key: 0000, mask: 0000}",
                            "UnknownFlowKey{type: 25, key: 00000000000000000000000000000000, mask: 00000000000000000000000000000000}",
                            "InPortFlowKey{vport: 1}",
                            "EthernetFlowKey{src: 72:07:06:06:16:3c, dst: ff:ff:ff:ff:ff:ff}",
                            "UnknownFlowKey{type: 24, key: 00000000, mask: 00000000}"
                        ],
                        "Actions": [
                            "OutputAction{vport: 0}"
                        ],
                        "Packets": 5,
                        "Bytes": 210,
                        "Used": 5174287404
                    }
                ]
            },
            "sleeve": null
        },
        "TrustedSubnets": [],
        "Interface": "datapath (via ODP)",
        "CaptureStats": {
            "FlowMisses": 10276
        },
        "MACs": [
            {
                "Mac": "3a:35:78:c1:dd:16",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:09:42.898637041Z"
            },
            {
                "Mac": "2e:19:a6:11:a5:26",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:06:29.400019073Z"
            },
            {
                "Mac": "72:36:79:96:e7:a2",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:09:04.961552851Z"
            },
            {
                "Mac": "fa:24:09:21:1e:50",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:07:41.773292147Z"
            },
            {
                "Mac": "3a:d1:74:6d:f2:fd",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:06:42.288540538Z"
            },
            {
                "Mac": "c2:25:68:f1:3d:a7",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:07:45.283861303Z"
            },
            {
                "Mac": "b6:7a:70:32:c9:66",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:08:26.912862029Z"
            },
            {
                "Mac": "1e:88:3a:9b:0e:16",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:10:08.373468709Z"
            },
            {
                "Mac": "96:0f:22:f4:e1:ab",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:06:38.415710708Z"
            },
            {
                "Mac": "5e:fd:8a:96:cb:6c",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:05:18.201950783Z"
            },
            {
                "Mac": "72:07:06:06:16:3c",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:10:06.312314207Z"
            },
            {
                "Mac": "f2:de:67:7c:b9:f1",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:07:38.165868939Z"
            },
            {
                "Mac": "12:74:3c:3a:20:1e",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:08:55.624559842Z"
            },
            {
                "Mac": "ea:d6:45:d4:e0:b2",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:08:21.318234673Z"
            },
            {
                "Mac": "d6:80:6b:eb:87:b6",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:10:08.373861079Z"
            },
            {
                "Mac": "9e:c8:60:03:b5:08",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:09:47.367737852Z"
            },
            {
                "Mac": "0e:52:05:4e:46:ef",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:08:53.236114159Z"
            },
            {
                "Mac": "4e:e8:db:55:0b:dc",
                "Name": "72:07:06:06:16:3c",
                "NickName": "ip-10-20-52-110.us-west-1.compute.internal",
                "LastSeen": "2017-11-10T02:10:53.760370024Z"
            }
        ]
    },
    "IPAM": {
        "Paxos": null,
        "Range": "100.96.0.0/11",
        "RangeNumIPs": 2097152,
        "ActiveIPs": 22,
        "DefaultSubnet": "100.96.0.0/11",
        "Entries": [
            {
                "Token": "100.96.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 112
            },
            {
                "Token": "100.100.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 19
            },
            {
                "Token": "100.104.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 1
            },
            {
                "Token": "100.108.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 25
            },
            {
                "Token": "100.112.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 7
            },
            {
                "Token": "100.116.0.0",
                "Size": 65536,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 82
            },
            {
                "Token": "100.117.0.0",
                "Size": 16384,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 20
            },
            {
                "Token": "100.117.64.0",
                "Size": 4096,
                "Peer": "a6:e9:b1:63:0c:60",
                "Nickname": "ip-10-20-53-217.us-west-1.compute.internal",
                "IsKnownPeer": false,
                "Version": 8
            },
            {
                "Token": "100.117.80.0",
                "Size": 3072,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 282
            },
            {
                "Token": "100.117.92.0",
                "Size": 1024,
                "Peer": "9a:f1:cb:a7:f9:87",
                "Nickname": "ip-10-20-61-57.us-west-1.compute.internal",
                "IsKnownPeer": false,
                "Version": 1
            },
            {
                "Token": "100.117.96.0",
                "Size": 4096,
                "Peer": "a6:e9:b1:63:0c:60",
                "Nickname": "ip-10-20-53-217.us-west-1.compute.internal",
                "IsKnownPeer": false,
                "Version": 0
            },
            {
                "Token": "100.117.112.0",
                "Size": 4096,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 98
            },
            {
                "Token": "100.117.128.0",
                "Size": 16384,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 1
            },
            {
                "Token": "100.117.192.0",
                "Size": 12288,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 44
            },
            {
                "Token": "100.117.240.0",
                "Size": 4096,
                "Peer": "e2:cd:be:ab:a6:d1",
                "Nickname": "ip-10-20-40-87.us-west-1.compute.internal",
                "IsKnownPeer": false,
                "Version": 16
            },
            {
                "Token": "100.118.0.0",
                "Size": 65536,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 1
            },
            {
                "Token": "100.119.0.0",
                "Size": 49152,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 6
            },
            {
                "Token": "100.119.192.0",
                "Size": 16384,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 4
            },
            {
                "Token": "100.120.0.0",
                "Size": 131072,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 2
            },
            {
                "Token": "100.122.0.0",
                "Size": 131072,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 207
            },
            {
                "Token": "100.124.0.0",
                "Size": 262144,
                "Peer": "72:07:06:06:16:3c",
                "Nickname": "ip-10-20-52-110.us-west-1.compute.internal",
                "IsKnownPeer": true,
                "Version": 34
            }
        ],
        "PendingClaims": null,
        "PendingAllocates": null
    }
}

How to reproduce it?

This is difficult to reproduce - it happens occasionally, but when I reboot, it will work again for a while.

Anything else we need to know?

Running on AWS, using kops to set up two instance groups and two namespaces.

Versions:

$ weave version
root@ip-10-20-52-110:/home/admin# docker exec -it 7ebce22fcde2 ./weave version
weave script 2.0.1
$ docker version
admin@ip-10-20-52-110:~$ docker --version
Docker version 1.12.6, build 78d1802
$ uname -a
admin@ip-10-20-52-110:~$ uname -a
Linux ip-10-20-52-110 4.4.78-k8s #1 SMP Fri Jul 28 01:28:39 UTC 2017 x86_64 GNU/Linux
$ kubectl version
dev[~] : kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2", GitCommit:"922a86cfcd65915a9b2f69f3f193b8907d741d9c", GitTreeState:"clean", BuildDate:"2017-07-21T08:08:00Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

or, if using Kubernetes:

$ kubectl logs -n kube-system <weave-net-pod> weave
On the 'bad' node it is:

dev[~] : kubectl -n kube-system logs --tail=50 -p weave-net-5r52m weave 
INFO: 2017/11/09 08:25:09.791462 ->[10.20.61.57:45119|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/09 08:25:09.792779 ->[10.20.61.57:6783] attempting connection
INFO: 2017/11/09 08:25:09.872083 ->[10.20.61.57:49453] connection accepted
INFO: 2017/11/09 08:25:09.872667 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:09.872719 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:09.872740 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection added
INFO: 2017/11/09 08:25:09.873848 ->[10.20.61.57:49453|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:09.873938 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:09.873954 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/09 08:25:09.873996 ->[10.20.61.57:49453|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection added
INFO: 2017/11/09 08:25:09.874071 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection shutting down due to error: Multiple connections to 9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal) added to 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/09 08:25:09.892937 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/11/09 08:25:09.893023 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using sleeve
INFO: 2017/11/09 08:25:09.893044 ->[10.20.61.57:49453|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection fully established
INFO: 2017/11/09 08:25:09.894249 sleeve ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: Effective MTU verified at 8939
INFO: 2017/11/09 08:25:10.382102 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:21.665283 ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v282: incoming message says owner ba:2d:dc:a6:b6:54 v347
INFO: 2017/11/09 08:25:21.665394 ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/09 08:25:21.666149 ->[10.20.40.87:6783] attempting connection
INFO: 2017/11/09 08:25:21.765540 ->[10.20.40.87:59932] connection accepted
INFO: 2017/11/09 08:25:21.805880 ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:21.805948 overlay_switch ->[e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:21.805967 ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection added
INFO: 2017/11/09 08:25:21.806588 ->[10.20.40.87:59932|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:21.806628 overlay_switch ->[e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:21.806646 ->[10.20.40.87:59932|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection shutting down due to error: Multiple connections to e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal) added to 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/09 08:25:22.337257 ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: connection fully established
INFO: 2017/11/09 08:25:22.356771 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/11/09 08:25:22.389047 sleeve ->[10.20.40.87:6783|e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)]: Effective MTU verified at 8939
INFO: 2017/11/09 08:25:27.271991 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v282: incoming message says owner ba:2d:dc:a6:b6:54 v347
INFO: 2017/11/09 08:25:27.272092 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/09 08:25:27.398173 ->[10.20.53.217:57562] connection accepted
INFO: 2017/11/09 08:25:27.450109 ->[10.20.53.217:57562|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:27.450186 overlay_switch ->[a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:27.450213 ->[10.20.53.217:57562|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection added
INFO: 2017/11/09 08:25:27.531341 ->[10.20.53.217:57562|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection fully established
INFO: 2017/11/09 08:25:28.021528 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/11/09 08:25:28.122828 sleeve ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: Effective MTU verified at 8939
INFO: 2017/11/09 08:25:39.777949 ->[10.20.61.57:49453|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v282: incoming message says owner ba:2d:dc:a6:b6:54 v347
INFO: 2017/11/09 08:25:39.778071 ->[10.20.61.57:49453|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/09 08:25:39.879253 ->[10.20.61.57:6783] attempting connection
INFO: 2017/11/09 08:25:39.899160 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/09 08:25:39.899265 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:39.899297 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection added
INFO: 2017/11/09 08:25:39.942346 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/11/09 08:25:39.942483 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using sleeve
INFO: 2017/11/09 08:25:39.942508 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection fully established
INFO: 2017/11/09 08:25:39.959192 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/09 08:25:40.081101 sleeve ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: Effective MTU verified at 8939
Killed

On the other node it is:

dev[~] : kubectl -n kube-system logs --tail=50 -p weave-net-268xw weave 
INFO: 2017/11/06 05:11:35.801663 overlay_switch ->[a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:11:35.801684 ->[10.20.53.217:56563|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:11:35.801784 ->[10.20.53.217:56563|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection shutting down due to error: read tcp4 10.20.55.87:6783->10.20.53.217:56563: read: connection reset by peer
INFO: 2017/11/06 05:11:35.801810 ->[10.20.53.217:56563|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:11:35.801819 Removed unreachable peer a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:45.167553 ->[10.20.53.217:6783] attempting connection
INFO: 2017/11/06 05:11:45.448088 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/06 05:11:45.448158 overlay_switch ->[a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:11:45.448183 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:11:45.846224 ->[10.20.52.110:33273] connection accepted
INFO: 2017/11/06 05:11:45.989033 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v280: incoming message says owner 72:07:06:06:16:3c v282
INFO: 2017/11/06 05:11:45.989178 ->[10.20.53.217:6783|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:11:45.989198 Removed unreachable peer 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:45.989205 Removed unreachable peer e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:45.989210 Removed unreachable peer a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:45.989216 Removed unreachable peer 9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:45.989575 ->[10.20.52.110:33273|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/06 05:11:45.989632 overlay_switch ->[72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:11:45.989649 ->[10.20.52.110:33273|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:11:46.505328 ->[10.20.52.110:33273|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection shutting down due to error: read tcp4 10.20.55.87:6783->10.20.52.110:33273: read: connection reset by peer
INFO: 2017/11/06 05:11:46.505493 ->[10.20.52.110:33273|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:11:46.505519 Removed unreachable peer 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/06 05:11:48.494273 Discovered local MAC 7e:f3:74:a9:30:84
INFO: 2017/11/06 05:12:07.264884 ->[10.20.52.110:6783] attempting connection
INFO: 2017/11/06 05:12:07.471203 ->[10.20.61.57:6783] attempting connection
INFO: 2017/11/06 05:12:07.557285 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/06 05:12:07.557399 overlay_switch ->[9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:12:07.557456 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:12:07.558784 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v280: incoming message says owner 72:07:06:06:16:3c v282
INFO: 2017/11/06 05:12:07.558934 ->[10.20.61.57:6783|9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:12:07.558970 Removed unreachable peer 9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.558986 Removed unreachable peer e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.559000 Removed unreachable peer a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.559013 Removed unreachable peer 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.598062 ->[10.20.52.110:6783|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/06 05:12:07.598224 overlay_switch ->[72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:12:07.598344 ->[10.20.52.110:6783|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:12:07.599366 ->[10.20.52.110:6783|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection shutting down due to error: Received update for IP range I own at 100.117.80.0 v280: incoming message says owner 72:07:06:06:16:3c v282
INFO: 2017/11/06 05:12:07.599499 ->[10.20.52.110:6783|72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:12:07.599535 Removed unreachable peer 72:07:06:06:16:3c(ip-10-20-52-110.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.599559 Removed unreachable peer 9a:f1:cb:a7:f9:87(ip-10-20-61-57.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.599577 Removed unreachable peer e2:cd:be:ab:a6:d1(ip-10-20-40-87.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.599595 Removed unreachable peer a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)
INFO: 2017/11/06 05:12:07.654424 ->[10.20.53.217:55149] connection accepted
INFO: 2017/11/06 05:12:07.735839 ->[10.20.53.217:55149|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection ready; using protocol version 2
INFO: 2017/11/06 05:12:07.736150 overlay_switch ->[a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)] using fastdp
INFO: 2017/11/06 05:12:07.857132 ->[10.20.53.217:55149|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection added (new peer)
INFO: 2017/11/06 05:12:07.857322 ->[10.20.53.217:55149|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection shutting down due to error: read tcp4 10.20.55.87:6783->10.20.53.217:55149: read: connection reset by peer
INFO: 2017/11/06 05:12:07.857354 ->[10.20.53.217:55149|a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)]: connection deleted
INFO: 2017/11/06 05:12:07.857366 Removed unreachable peer a6:e9:b1:63:0c:60(ip-10-20-53-217.us-west-1.compute.internal)

Note that the other node isn't reporting 'KILLED'  - perhaps weave-net has been killed without being restarted - could be a problem with docker not restarting effectively. 

Network:

$ ip route
$ ip -4 -o addr
$ sudo iptables-save

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

the logs are in my comment above.

Ha! I had no idea the little black triangle would open up to show more.