weave: weave-npc blocking connections with valid network policy after a period of time (2.6.0)

What you expected to happen?

Similar to #3761, we are seeing traffic being blocked by weave-npc, but we are using network policies. I would expect traffic to not be blocked by NPC with valid network policy in place

What happened?

We have seen now, consistently (about once every 1-2 weeks) traffic gets blocked between pods inside of a namespace where traffic was working fine earlier. After we debugged the issue, and saw the ipset’s on the host to not have valid entries for the pods, we restart weave on the host, and the ipsets become populated, and traffic continues to flow.

How to reproduce it?

I wish we had an easy way to consistently reproduce this issue, but we are beginning to see this issue nearly every week within one specific cluster.

Anything else we need to know?

cloud provider: aws custom built cluster using in house automation.

Versions:

# ./weave version
weave script 2.6.0
# docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
# uname -a
Linux ip-10-0-173-150 4.15.0-1056-aws #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T16:54:35Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:11:50Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Unfortunately, these logs do not show the weave logs before restart, but when we run into this issue again (in a week or so), we will get those logs and update this issue

https://gist.github.com/naemono/31df744c7ee6b48dba7b554e06553f4b

When this issue is happening, we begin to see a spike in weavenpc_blocked_connections_total from prometheus: image

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 23 (17 by maintainers)

Most upvoted comments

@naemono thanks for reporting the issue

After we debugged the issue, and saw the ipset’s on the host to not have valid entries for the pods,

when we run into this issue again (in a week or so), we will get those logs and update this issue

Please gather the weave-npc logs, and ipset dumps. add/detion of entry to ipset’s are logged. so we should be able to track under what scenario ipset is going out-of-sync from desired state