traefik: Getting "Kubernetes connection error failed to decode watch event : unexpected EOF" every minute in Traefik log

I am testing Traefik 1.1.0-rc1 as an ingress controller for a Kubernetes cluster. I’ve noticed that the log for Traefik has this error message showing up every minute on the dot:

time="2016-10-12T13:29:49Z" level=error msg="Kubernetes connection error failed to decode watch event: GET \"https://10.254.0.1:443/apis/extensions/v1beta1/ingresses\" : unexpected EOF, retrying in 709.591937ms"

At the exact same instant, I notice a corresponding panic in the log of the Kubernetes API server when using Kubernetes 1.4+. (I’ve tested against 1.3.7, 1.3.8, 1.4.0, and 1.4.1. No panic shows in the log on the 1.3 versions, but I get the same error log from Traefik regardless).

This is the panic observed from Kubernetes 1.4.1:

Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: I1012 13:29:49.792607    9712 logs.go:41] http: panic serving 10.75.16.51:60576: kill connection/stream
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: goroutine 21581 [running]:
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: net/http.(*conn).serve.func1(0xc823f5d980)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/net/http/server.go:1389 +0xc1
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: panic(0x3601100, 0xc82024dc70)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/runtime/panic.go:443 +0x4e9
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: k8s.io/kubernetes/pkg/apiserver.(*baseTimeoutWriter).timeout(0xc8235d8ca0, 0x0, 0x0)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:321 +0x50f
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP(0xc820545520, 0x7fce6fee43e8, 0xc82345c680, 0xc8222e8620)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:201 +0x211
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: k8s.io/kubernetes/pkg/apiserver.MaxInFlightLimit.func1(0x7fce6fee43e8, 0xc82345c680, 0xc8222e8620)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:119 +0x11d
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: net/http.HandlerFunc.ServeHTTP(0xc822260db0, 0x7fce6fee43e8, 0xc82345c680, 0xc8222e8620)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/net/http/server.go:1618 +0x3a
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: net/http.serverHandler.ServeHTTP(0xc82227e800, 0x7fce6fee43e8, 0xc82345c680, 0xc8222e8620)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/net/http/server.go:2081 +0x19e
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: net/http.(*conn).serve(0xc823f5d980)
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/net/http/server.go:1472 +0xf2e
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: created by net/http.(*Server).Serve
Oct 12 13:29:49 ip-10-75-16-50 kube-apiserver[9712]: /usr/local/go/src/net/http/server.go:2137 +0x44e

It takes exactly one minute for the first error message to show up after the Traefik pod starts. I did a little bit of looking around and discovered the default timeout for requests to the Kubernetes api server appears to be 1 minute. It looks like Traefik initiates a request, and then it times out after 1 minute. If I get on the Kubernetes master and curl the /apis/extensions/v1beta1/ingresses endpoint it gives the expected information immediately with no error. I can also list ingress information with kubectl without error. I’m not sure which side is misbehaving, but since other tools seems to not have an issue requesting this information, it seemed like maybe Traefik was the right place to start. Note that despite this error, ingresses do still appear to show up and be accessible. I can provide more details about the setup of my cluster and how I’ve configured Traefik if needed.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 6
Comments: 20 (15 by maintainers)

Most upvoted comments

I also got this problem. The problem seems to be that “true” is missing in the watch urls.

This gives the reported errors and panics

curl -s -v "http://127.0.0.1:8080/apis/extensions/v1beta1/ingresses?resourceVersion=6081&watch="

This works fine and waits as long as specified in the kubernetes --min-request-timeout flag (default 1800s)

curl -s -v "http://127.0.0.1:8080/apis/extensions/v1beta1/ingresses?resourceVersion=6081&watch=true"

I tried to understand the kubernetes code when it comes to timeout handling. There is a default global timeout applied to all requests when it does not match one of known long running request url patterns. It also checks for watch=true. The global timeout is meant to be a panic, so this is expected behavior.

codablock on Nov 22, 2016

I tried the curl you asked for. Note that this test was done against Kubernetes 1.4.4 which was just released.

I deleted my Traefik pods so that nothing would be causing panics in the API server log. I then followed the API server log in one terminal, and ran the curl in a second terminal. The curl initially spit out data about all the ingresses currently in the cluster (they showed as being added) and then hung (as expected). I then let it sit for 15 minutes. During the 15 minutes, no panics were observed in the API server log, and the curl just sat there.

I then did another test where I started the curl and let it sit for several minutes, and then added an ingress to the cluster. The curl printed data about the ingress being added. I deleted the ingress, and the curl printed data about the ingress being deleted.

kashook on Oct 21, 2016