linkerd2: proxy memory leak

We can reliably reproduce a proxy memory leak with the following config:

:; kubectl create ns lifecycle
:; curl -s https://raw.githubusercontent.com/linkerd/linkerd-examples/master/lifecycle/lifecycle.yml |linkerd inject - |kubectl apply -f - -n lifecycle

Then, the leak can be observed by watching the broadcast container, which slowly grows its RSS as long as the process runs:

:; while true ; do date ; kubectl top po --containers -n lifecycle -l app=bb-broadcast |awk '$2 ~ /^linkerd-proxy$/ {print $0}' ; sleep 600 ; done 
Wed May 29 21:38:17 UTC 2019
bb-broadcast-8768bbf55-p62l4   linkerd-proxy   196m         9Mi    

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (17 by maintainers)

Most upvoted comments

@olix0r I don’t believe that’s what’s happening here — it’s the connection attempt that’s timing out, the error message is “request timed out” only because it’s coming from the tower-timeout middleware wrapping the connect service.

The above example only leaks when using kube-proxy for usermode proxying. In this mode, kube-proxy terminates the TCP connection. In iptables proxying, the connection never succeeds, so the request is never dispatched into hyper, etc.

Here’s an example TCP stream in the usermode case:

78110 3.466894035   10.1.1.155 → 10.1.1.1     TCP  74 39142 → 34397 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=406828503 TSecr=0 WS=128
78111 3.466903305 10.152.183.170 → 10.1.1.155   TCP  74 8888 → 39142 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2708174506 TSecr=406828503 WS=128
78112 3.466909111   10.1.1.155 → 10.1.1.1     TCP  66 39142 → 34397 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=406828503 TSecr=2708174506
78113 3.466937435   10.1.1.155 → 10.1.1.1     TCP  90 39142 → 34397 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=24 TSval=406828503 TSecr=2708174506
78114 3.466943253 10.152.183.170 → 10.1.1.155   TCP  66 8888 → 39142 [ACK] Seq=1 Ack=25 Win=29056 Len=0 TSval=2708174506 TSecr=406828503
78115 3.466948496 10.152.183.170 → 10.1.1.155   TCP  66 8888 → 39142 [RST, ACK] Seq=1 Ack=25 Win=29056 Len=0 TSval=2708174506 TSecr=406828503

It seems likely we could reproduce this with any client that talks to a server that accepts and immediately closes a connection as soon as it reads some data…