kubernetes: kubectl port-forward broken pipe

What happened:

We have a pod running in our k8s cluster that I connect to via kubectl port-forward. I am able to connect to this pod (using the following command) but then start getting broken pipe error messages after maintaining a connection for what is typically 30-60s.

kubectl port-forward --namespace monitoring deployment/cost-monitor 9090

There seems to be a correlation between error rate and the amount of data being transferred. I see the following error message initially:

E0225 15:20:06.212139 26392 portforward.go:363] error copying from remote stream to local connection: readfrom tcp6 [::1]:9090->[::1]:57794: write tcp6 [::1]:9090->[::1]:57794: write: broken pipe

These errors are oftentimes followed by timeout messages but necessarily immediately:

E0225 15:22:30.454203 26392 portforward.go:353] error creating forwarding stream for port 9090 -> 9090: Timeout occured

What you expected to happen: No error messages or major degradation in transfer rate.

How to reproduce it (as minimally and precisely as possible): Connect via port-forward and transfer ~5 mb over several minutes.

What else?: We have multiple HTTP requests being made at any given time on this port-forward connection.

Environment: Experiencing on both AWS kops and GKE

// kops Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.3”, GitCommit:“721bfa751924da8d1680787490c54b9179b1fed0”, GitTreeState:“clean”, BuildDate:“2019-02-04T04:48:03Z”, GoVersion:“go1.11.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“10”, GitVersion:“v1.10.11”, GitCommit:“637c7e288581ee40ab4ca210618a89a555b6e7e9”, GitTreeState:“clean”, BuildDate:“2018-11-26T14:25:46Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”}

// GKE Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.3”, GitCommit:“721bfa751924da8d1680787490c54b9179b1fed0”, GitTreeState:“clean”, BuildDate:“2019-02-04T04:48:03Z”, GoVersion:“go1.11.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“10+”, GitVersion:“v1.10.11-gke.1”, GitCommit:“5c4fddf874319c9825581cc9ab1d0f0cf51e1dc9”, GitTreeState:“clean”, BuildDate:“2018-11-30T16:18:58Z”, GoVersion:“go1.9.3b4”, Compiler:“gc”, Platform:“linux/amd64”}

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 162
  • Comments: 102 (25 by maintainers)

Commits related to this issue

Most upvoted comments

I had a similar problem using kubectl port-forward and I resolved it with ulimit -n 65536 on Mac OS.

I ran ulimit -n 65536 as-is. You might need sudo on your system. This increases the file descriptor limit of the local shell where you’re running kubectl port-forward.

My hypothesis is that kubectl port-forward doesn’t clean up its sockets properly so the local shell runs into the file descriptor limit after some time under high load (or maybe after terminating a few times). This seemed to stop port-forward from breaking all the time when I was running automated tests against a K8s service I was trying to debug.

Just noticed this myself with a fresh single node installation.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Port forwarding didn’t work at first

$ kubectl -n argo port-forward deployment/argo-server 2746:2746
Forwarding from 127.0.0.1:2746 -> 2746
Forwarding from [::1]:2746 -> 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746
E0128 10:49:21.896317   13138 portforward.go:372] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:2746->127.0.0.1:65062: write tcp4 127.0.0.1:2746->127.0.0.1:65062: write: broken pipe
Handling connection for 2746
E0128 10:49:51.866614   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
E0128 10:49:52.849451   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
Handling connection for 2746
E0128 10:50:22.857444   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
Handling connection for 2746
E0128 10:50:52.863231   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
Handling connection for 2746
Handling connection for 2746
E0128 10:51:22.904116   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
E0128 10:51:22.904268   13138 portforward.go:340] error creating error stream for port 2746 -> 2746: Timeout occured
Handling connection for 2746
Handling connection for 2746
^C%

When I let kubectl decide the host port, port-forwarding worked!

$ kubectl -n argo port-forward deployment/argo-server :2746
Forwarding from 127.0.0.1:65199 -> 2746
Forwarding from [::1]:65199 -> 2746
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
Handling connection for 65199
^CHandling connection for 65199

Then forwarding to the same port on the host worked!

$ kubectl -n argo port-forward deployment/argo-server 2746:2746
Forwarding from 127.0.0.1:2746 -> 2746
Forwarding from [::1]:2746 -> 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746
Handling connection for 2746

What’s the fix for this

I’m trying to locate and fix this issue 【WIP】

thx for https://github.com/kargakis/k8s-74551-reproducer


environment:

kubectl, kube-apiserver, kubelet, all from master, commit 1f0e718585af775ba847f6fc36fa287e3f2ecd19
containerd:  v1.6.19 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f

【Please correct me if I’m wrong】

Here are some picture for troubleshooting:


kubectl port-forward general process:

image

pcap in the target container namespace:

image

reproducible results:

image

The reason identified so far is:


So I made some fixes to containerd, https://github.com/sxllwx/containerd/commit/28755fff7d67b64576054e4fbc4845e116d92b63ji

I’m running the failed example fine with containerd using this patch.

I will continue to follow up this issue

I had success with @anthcor 's solution: let him decide the local port.

kubectl  port-forward svc/foo :8080
Forwarding from 127.0.0.1:57708 -> 8080
Forwarding from [::1]:57708 -> 8080
Handling connection for 57708
Handling connection for 57708 ... etc

I ALSO had success by specifying local address as 127.0.0.1 (I don’t need ipv6). The WEIRD thing is after doing this I can go back to the original form, kubectl port-forward svc/foo 1234:8080 and it works again. This smells like a socket reuse issue.

This is docker-desktop on a mac.

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

/assign

I have similar problem, this occurs in azure k8s cluster(client: v1.19.3, server: v1.17.3) and in local kind cluster (client: v1.19.3, server: v1.19.1). It’s looks very similar like @ysimonson suggestion here. And (I guess) its related with http streams.

Problem occurs when you stop downloading webpage/file during the downloading is in progress. Then broken pipe logs appears, and after a while Timeout occured. After that, everything stops, and connection is unusable.

Forwarding from 127.0.0.1:8080 -> 80
Handling connection for 8080
Handling connection for 8080
E1105 16:49:04.050790   90091 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8080->127.0.0.1:56050: read: connection reset by peer
E1105 16:49:04.051655   90091 portforward.go:372] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:8080->127.0.0.1:56050: write tcp4 127.0.0.1:8080->127.0.0.1:56050: write: broken pipe
Handling connection for 8080
E1105 16:49:04.351923   90091 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8080->127.0.0.1:56052: read: connection reset by peer
Handling connection for 8080
E1105 16:49:04.354772   90091 portforward.go:372] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:8080->127.0.0.1:56052: write tcp4 127.0.0.1:8080->127.0.0.1:56052: write: broken pipe
E1105 16:49:04.712995   90091 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8080->127.0.0.1:56058: read: connection reset by peer
Handling connection for 8080
E1105 16:49:04.715697   90091 portforward.go:372] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:8080->127.0.0.1:56058: write tcp4 127.0.0.1:8080->127.0.0.1:56058: write: broken pipe
E1105 16:49:34.715510   90091 portforward.go:340] error creating error stream for port 8080 -> 80: Timeout occured

I don’t exactly know how k8s/kublet works in details, but this logs from containerd always appear when connection is broken.

Nov 05 18:57:04 kind-control-plane containerd[48]: E1105 18:57:04.975425      48 httpstream.go:143] (conn=&{0xc0004f7e40 [0xc000b70b40 0xc000b70be0 0xc0005903c0 0xc000b70c80 0xc000b70d20] {0 0} 0x55fe64dbd9d0}, request=2) timed out waiting for streams

When problem occurs in “production” this logs have a hundreds of connections inside [].

Long story short: I prepared reproduction of this bug with github actions, and problem occurs in workflow: https://github.com/velmafia/k8s_issue-74551/actions/runs/346450615 kind logs are stored in artifacts, feel free to download/fork and retest also with your version on k8s.

we’re also seeing similar issues at Twitter in our usages of kubectl port-forward.

I have created a simple client+server reproducer at https://github.com/kargakis/k8s-74551-reproducer

This issue is unfortunate, kubectl port-forward makes it very easy for our less kube-experienced developers to leverage our production deployment for development.

Hi, I’m hitting this on my cluster too.

Here’s what we’re currently running: Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.3”, GitCommit:“721bfa751924da8d1680787490c54b9179b1fed0”, GitTreeState:“clean”, BuildDate:“2019-02-04T04:48:03Z”, GoVersion:“go1.11.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“11”, GitVersion:“v1.11.6”, GitCommit:“b1d75deca493a24a2f87eb1efde1a569e52fc8d9”, GitTreeState:“clean”, BuildDate:“2018-12-16T04:30:10Z”, GoVersion:“go1.10.3”, Compiler:“gc”, Platform:“linux/amd64”}

Any additional repro info that I can provide?

happens to me when port-forwarding to minio and trying to download large files, getting: E0706 18:38:06.048525 1 portforward.go:340] error creating error stream for port 9000 -> 9000: Timeout occured

Wow! I wasn’t expecting that culprit! great find!

one of the reasons, there can be more, is that the connection can be idle and some intermediate devices with low timers close the TCP session, so it eventually times out in one/both the endpoints of the connection. SSH can configure periodic keepalives to keep the traffic flowing and renewing the timeouts, or maybe you are using the session … This is why is important to know if with a continuous TCP stream you are hitting the issue,

This issue is waiting for a contributor to dig in and diagnose. My current guess is recorded above in the comment here.

I can also reliably reproduce it, by trying to fetch a 6MB file over a simple http server.

I also get this, especially when using kubectl port-forward for redis or postgres or haproxy.

E0807 23:29:49.561205   87544 portforward.go:303] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:5432->127.0.0.1:52434: write tcp4 127.0.0.1:5432->127.0.0.1:52434: write: broken pipe

I’m seeing this issue as well. I think kubelet is not properly closing the error channel associated with port forwarded connections, because I’m seeing the port forwarder get stuck here.

You don’t even need to send a lot of data, just break the reading end of the connection from the client-side while the port forwarder is trying to write. Assuming you’re port forwarding an HTTP server available on port 30600 locally, this will reliably reproduce the issue on k8s for docker for mac (server version v1.10.11):

#!/usr/bin/env python3

import datetime
import http.client

for _ in range(10):
    conn = http.client.HTTPConnection("localhost:30600")
    conn.request("GET", "/")
    conn.getresponse()
    conn.close()
    print(datetime.datetime.now())

I can reliably reproduce the issue on a newer version of k8s + ubuntu + minikube as well, but it is more resilient, and the above script won’t trigger the problem. I’m not sure yet how to make a minimally reproducible test for that target yet.

@sxllwx thanks for your work! Do you think that this issue kubernetes/kubectl#1368, is also caused by the same problem you identified?

Looks like the same problem. You can try to use the containerd branch of https://github.com/sxllwx/containerd/tree/fix/k8s-issue-74551, I believe this issue can be resolved.

I’m also pushing this PR to be merged.

Looks like it’s the last commit in this branch: https://github.com/sxllwx/containerd/commits/fix/k8s-issue-74551

I had success with @anthcor 's solution: let him decide the local port.

kubectl  port-forward svc/foo :8080
Forwarding from 127.0.0.1:57708 -> 8080
Forwarding from [::1]:57708 -> 8080
Handling connection for 57708
Handling connection for 57708 ... etc

I ALSO had success by specifying local address as 127.0.0.1 (I don’t need ipv6). The WEIRD thing is after doing this I can go back to the original form, kubectl port-forward svc/foo 1234:8080 and it works again. This smells like a socket reuse issue.

This is docker-desktop on a mac.

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Wow, it really works! Thanks!

Upon adding the following to our haproxy front-end, this issue has pretty much been eliminated.

timeout client 4h

and the following to the backend:

timeout server 4h

Upon making that change, absolutely zero timeouts for an idle port-forward for quite sometime. There must be similar options if you have a different load balancer in front of your Kubernetes API. Perhaps this is something that people who are affected by it can try that.

is there any fix without upgrading containerd? I do not have access

I’ve had some success in getting better reliability by reducing MTU of the network interface.

an interesting side note, I was getting this same error using ‘curl’ commands to a gRPC service being port-forwarded to…I tried various things mentioned in this Issue but none of them worked. I then tried a golang program that hit the same port forwarded address and the error did not occur. No idea why, but thought it was worth mentioning for others seeing this error.

https://github.com/kubernetes/kubernetes/issues/74551#issuecomment-910520361 is the best short-term fix imo. have run into this many times testing many services port forwarding to my local network

I did not have success with @anthcor 's workaround. Hope this gets fixed.

/assign

@aojea the original issue and myself both have the error:

error copying from remote stream to local connection ... write: broken pipe

Full errors:

// https://github.com/kubernetes/kubernetes/issues/74551
E0225 15:20:06.212139 26392 portforward.go:363] error copying from remote stream to local connection: readfrom tcp6 [::1]:9090->[::1]:57794: write tcp6 [::1]:9090->[::1]:57794: write: broken pipe

// https://github.com/kubernetes/kubernetes/issues/74551#issuecomment-519361205
E0807 23:29:49.561205  87544 portforward.go:303] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:5432->127.0.0.1:52434: write tcp4 127.0.0.1:5432->127.0.0.1:52434: write: broken pipe

I think the other commented errors should be addressed in a separate issue so we could nail down why this specific error happens and fix it.

I get the error when I port-forward directly to haproxy, redis or postgres pod and try to read a lot of data through it. There is no load balancer in front of the apiserver. We’re using AWS EKS with version v1.13.12-eks-eb1860 and client version v1.16.3

usePlaintext()

Can you say more about that? That’s suspicious, if the problem is binary data, maybe there’s some specific byte sequence that is tickling a bug in the proxy.

Also got the same kind of problem as @rihardsk. When looking at the kube-registry logs from the container, at least the following error logs were seen

time="2019-08-18T14:28:00Z" level=error msg="response completed with error" err.code="blob unknown" err.detail=sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893 err.message="blob unknown to registry" go.version=go1.6.3 http.request.host="localhost:5000" http.request.id=969f3ee4-aa19-4821-9d74-0e5a7639deba http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51570" http.request.uri="/v2/sdltestapp/blobs/sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893" http.request.useragent="docker/19.03.1 go/go1.12.5 git-commit/74b1e89 kernel/3.10.0-957.27.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/19.03.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=98.755941ms http.response.status=404 http.response.written=157 instance.id=be7e4ed5-99dd-49b8-93b6-e5dbbd0c2aac vars.digest="sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893" vars.name=sdltestapp version=v2.5.1 
127.0.0.1 - - [18/Aug/2019:14:28:00 +0000] "HEAD /v2/sdltestapp/blobs/sha256:bbab4ec87ac4f89eaabdf68dddbd1dd930e3ad43bded38d761b89abf9389a893 HTTP/1.1" 404 157 "" "docker/19.03.1 go/go1.12.5 git-commit/74b1e89 kernel/3.10.0-957.27.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/19.03.1 \\(linux\\))"

time="2019-08-18T14:28:01Z" level=error msg="unknown error reading request payload: write /var/lib/registry/docker/registry/v2/repositories/sdltestapp/_uploads/840c6e4f-4c28-4b24-ad52-be8753612e39/data: cannot allocate memory" go.version=go1.6.3 http.request.host="localhost:5000" http.request.id=ecf2db39-cbfa-4b52-ad0b-af6b839d931e http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51458" http.request.uri="/v2/sdltestapp/blobs/uploads/840c6e4f-4c28-4b24-ad52-be8753612e39?_state=ebA5Em9IUSedYo_D2I8p31EGKwkIdRzlXfnwPR5U0CB7Ik5hbWUiOiJzZGx0ZXN0YXBwIiwiVVVJRCI6Ijg0MGM2ZTRmLTRjMjgtNGIyNC1hZDUyLWJlODc1MzYxMmUzOSIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAxOS0wOC0xOFQxNDoyNzo0NS4wODA2NDk3MDlaIn0%3D" http.request.useragent="docker/19.03.1 go/go1.12.5 git-commit/74b1e89 kernel/3.10.0-957.27.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/19.03.1 \\(linux\\))" instance.id=be7e4ed5-99dd-49b8-93b6-e5dbbd0c2aac vars.name=sdltestapp vars.uuid=840c6e4f-4c28-4b24-ad52-be8753612e39 version=v2.5.1 

time="2019-08-18T14:28:03Z" level=error msg="response completed with error" err.code=unknown err.detail="write /var/lib/registry/docker/registry/v2/repositories/sdltestapp/_uploads/840c6e4f-4c28-4b24-ad52-be8753612e39/data: cannot allocate memory" err.message="unknown error" go.version=go1.6.3 http.request.host="localhost:5000" http.request.id=ecf2db39-cbfa-4b52-ad0b-af6b839d931e http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51458" http.request.uri="/v2/sdltestapp/blobs/uploads/840c6e4f-4c28-4b24-ad52-be8753612e39?_state=ebA5Em9IUSedYo_D2I8p31EGKwkIdRzlXfnwPR5U0CB7Ik5hbWUiOiJzZGx0ZXN0YXBwIiwiVVVJRCI6Ijg0MGM2ZTRmLTRjMjgtNGIyNC1hZDUyLWJlODc1MzYxMmUzOSIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAxOS0wOC0xOFQxNDoyNzo0NS4wODA2NDk3MDlaIn0%3D" http.request.useragent="docker/19.03.1 go/go1.12.5 git-commit/74b1e89 kernel/3.10.0-957.27.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/19.03.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=17.792958322s http.response.status=500 http.response.written=212 instance.id=be7e4ed5-99dd-49b8-93b6-e5dbbd0c2aac vars.name=sdltestapp vars.uuid=840c6e4f-4c28-4b24-ad52-be8753612e39 version=v2.5.1 

Edit: After increasing the memory limit (from 100Mi to 200Mi) of the registry container, I didn’t see the problem anymore.