cilium: Service CT entries leaking

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Visiting clusterIP in a POD leads to a service CT entry generated in CT map with a long timeout. Because the service CT entry does not see close in INGRESS direction, I think this is a critical bug.

$ cilium bpf ct list global |grep 192.168.15.94
TCP OUT 192.168.15.94:80 -> 172.16.0.16:60462 service expires=113956 RxPackets=0 RxBytes=4209 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1b LastTxReport=92356 Flags=0x0012 [ TxClosing SeenNonSyn ] RevNAT=116 SourceSecurityID=0 IfIndex=0

If the visiting is very frequently, such as a high load application. The count of service CT entries will has a significant growth.

$ cilium bpf ct list global |grep 192.168.15.94 |grep service |wc -l
14116

As time goes on, the CT map will be full. Then the new connection will be reset like https://github.com/cilium/cilium/issues/17457

I am going to fix it with the following logic:

  1. After deNAT, we have the clusterIP, then lookup the CT map again with the new tuple, if the entry is existed, set entry->rx_closing with 1.

Cilium Version

Client: 1.9.0 go version go1.15.4 linux/amd64 Daemon: 1.9.0 go version go1.15.4 linux/amd64

Kernel Version

Linux 4.14.105-19-0019 SMP Fri Jan 15 11:39:34 CST 2021 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.18”, GitCommit:“6b913dbde30aa95b247be30a5318fb912f8fe29e”, GitTreeState:“clean”, BuildDate:“2021-08-11T10:20:21Z”, GoVersion:“go1.15.11”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“18+”, GitVersion:“v1.18.18-57+776098ae2e7bf3-dirty”, GitCommit:“776098ae2e7bf358cce0af0b0faf139fe66c6c48”, GitTreeState:“dirty”, BuildDate:“2021-09-01T07:38:52Z”, GoVersion:“go1.15.11”, Compiler:“gc”, Platform:“linux/amd64”}

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments