kubernetes: API server panics when writing response

When I was running large scale kubemark test (2000 Nodes with 95 pods each) my API server crashed with the error:

I0715 12:14:23.387809    3511 trace.go:61] Trace "List /api/v1/namespaces/e2e-tests-density-idftq/pods" (started 2016-07-15 12:13:01.712298253 +0000 UTC):
[102.412µs] [102.412µs] About to List from storage
[2.53469971s] [2.534597298s] Listing from storage done
[2.954350982s] [419.651272ms] Self-linking done
[1m21.675466193s] [1m18.721115211s] Writing http response done (130003 items)
[1m21.675467856s] [1.663µs] END
E0715 12:14:23.393199    3511 apiserver.go:210] recover from panic situation: - Header called after Handler finished^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:206^M
    /usr/local/go/src/runtime/asm_amd64.s:472^M
    /usr/local/go/src/runtime/panic.go:443^M
    /usr/local/go/src/net/http/h2_bundle.go:4452^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238^M
    <autogenerated>:8^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:102^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:272^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:120^M
    /usr/local/go/src/net/http/server.go:1618^M
    /usr/local/go/src/net/http/server.go:1910^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:485^M
    /usr/local/go/src/net/http/server.go:1618^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:420^M
    /usr/local/go/src/net/http/server.go:1618^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/auth/handlers/handlers.go:66^M
    /usr/local/go/src/net/http/server.go:1618^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101^M
    /usr/local/go/src/net/http/server.go:1618^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101^M
    /usr/local/go/src/net/http/server.go:1618^M
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:194^M
    /usr/local/go/src/runtime/asm_amd64.s:1998^M

I0715 12:14:23.387986    3511 handlers.go:164] GET /api/v1/nodes/10.180.12.88: (144.600059ms) 200 [[kubemark/v1.4.0 (linux/amd64) kubernetes/12483d9] 146.148.109.243:33217]
I0715 12:14:23.388150    3511 handlers.go:164] GET /api/v1/nodes/10.180.1.92: (154.464521ms) 200 [[kubemark/v1.4.0 (linux/amd64) kubernetes/12483d9] 104.197.214.41:58039]
panic: Header called after Handler finished [recovered]
        panic: Header called after Handler finished

goroutine 54829830 [running]:
panic(0x20f0220, 0xc821e07010)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
net/http.(*http2responseWriter).Header(0xc956288948, 0x7fe725a76c88)
        /usr/local/go/src/net/http/h2_bundle.go:4452 +0x78
k8s.io/kubernetes/pkg/httplog.(*respLogger).Header(0xc9702bd260, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*baseTimeoutWriter).Header(0xc9c39fb9e0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238 +0x39
k8s.io/kubernetes/pkg/apiserver.logStackOnRecover(0x7fe725d5b8a8, 0xc82014ad90, 0x20f0220, 0xc8f69cd900, 0x7fe725b36be8, 0xc8fc256620)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/apiserver.go:213 +0x2d9
k8s.io/kubernetes/pkg/apiserver.InstallRecoverHandler.func1(0x20f0220, 0xc8f69cd900, 0x7fe725b36be8, 0xc8fc256620)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/apiserver.go:195 +0x5e
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch.func2(0xc82032cea0, 0xc83a235710)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:206 +0x6a
panic(0x20f0220, 0xc8f69cd900)
        /usr/local/go/src/runtime/panic.go:443 +0x4e9
net/http.(*http2responseWriter).Header(0xc956288948, 0x2)
        /usr/local/go/src/net/http/h2_bundle.go:4452 +0x78
k8s.io/kubernetes/pkg/httplog.(*respLogger).Header(0xc9702bd260, 0x2decc58)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*baseTimeoutWriter).Header(0xc9c39fb9e0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238 +0x39
k8s.io/kubernetes/pkg/apiserver/metrics.(*fancyResponseWriterDelegator).Header(0xc8fc256630, 0xc8bc4a8e10)
        <autogenerated>:8 +0x53
k8s.io/kubernetes/pkg/apiserver/metrics.InstrumentRouteFunc.func1(0xc8bee93a10, 0xc90d296780)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:102 +0x35c
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc82032cea0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:272 +0xf30
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).(k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.dispatch)-fm(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:120 +0x3e
net/http.HandlerFunc.ServeHTTP(0xc820ceb980, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
net/http.(*ServeMux).ServeHTTP(0xc8202c07b0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1910 +0x17d
k8s.io/kubernetes/pkg/apiserver.WithAuthorizationCheck.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:485 +0xf4
net/http.HandlerFunc.ServeHTTP(0xc820313ac0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.WithImpersonation.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:420 +0xfe
net/http.HandlerFunc.ServeHTTP(0xc820313b00, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/auth/handlers.NewRequestAuthenticator.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/auth/handlers/handlers.go:66 +0x2c9
net/http.HandlerFunc.ServeHTTP(0xc8203f5270, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820522660, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820522680, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
        /usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP.func1(0xc8216ec9c0, 0x7fe725b36ba8, 0xc8fc256620, 0xc83c513260, 0xc90d296720)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:194 +0x91
created by k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:196 +0x15b

It happened around 110k-120k Pods in the cluster.

It looks like a bug in http2 handling. cc @timothysc @smarterclayton @wojtek-t @lavalamp @kubernetes/sig-api-machinery

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 50 (45 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, I own Go’s net/http and http2 stuff.

I saw https://github.com/golang/go/issues/16451 but it appears that the bug is actually in Kubernetes.

When you get the error message:

“panic: Header called after Handler finished”

That means the ResponseWriter.Header() method was called after the ServeHTTP completed.

So, I went looking up your panic’s call stack looking for suspects.

My first guess was fancyResponseWriterDelegator,

k8s.io/kubernetes/pkg/apiserver/metrics.(*fancyResponseWriterDelegator).Header(0xc8fc256630, 0xc8bc4a8e10)

… because anything named fancy is just asking for trouble. But it seems fine, despite having no docs on why it’s fancy.

But working up the stack more, I see things with “timeout” in the name, which makes me think there are goroutines involved running ServeHTTP async. And sure enough:

k8s.io/kubernetes/pkg/apiserver/handlers.go:

func (t *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        after, msg := t.timeout(r)
        if after == nil {
                t.handler.ServeHTTP(w, r)
                return
        }

        done := make(chan struct{}, 1)
        tw := newTimeoutWriter(w)
        go func() {
                t.handler.ServeHTTP(tw, r)
                done <- struct{}{}
        }()
        select {
        case <-done:
                return
        case <-after:
                tw.timeout(msg)
        }
}

… good, you’re passing a custom ResponseWriter to the new ServeHTTP started in the goroutine.

So I’d expect newTimeoutWriter to handle not calling the underlying Header method once the original ServeHTTP was done.

But:

type timeoutWriter interface {
        http.ResponseWriter
        timeout(string)
}

func newTimeoutWriter(w http.ResponseWriter) timeoutWriter {
        base := &baseTimeoutWriter{w: w}
... 
               return base
}

type baseTimeoutWriter struct {
        w http.ResponseWriter

        mu          sync.Mutex
        timedOut    bool
        wroteHeader bool
        hijacked    bool
}

func (tw *baseTimeoutWriter) Header() http.Header {
        return tw.w.Header()
}

The baseTimeOutWriter’s Header method doesn’t have any synchronization. It’s always accessing tw.w (which is the original ServeHTTP’s goroutine’s ResponseWriter), even the timeoutHandler.ServeHTTP has returned.

But https://golang.org/pkg/net/http/#ResponseWriter says:

A ResponseWriter may not be used after the Handler.ServeHTTP method has returned.

So, a data race (as is the case in HTTP/1 I imagine) or a panic (with HTTP/2) are both possible.

Arguably the HTTP/2 behavior is better because it told you there’s a problem rather than having a silent data race which you might not notice unless your tests run under the race detector were particularly thorough concurrency/timeout-wise.

You’ll need to need to make your timeoutHandler and its timeoutWriter a bit more aware of each other and synchronize any method calls on the original ServerHTTP’s ResponseWriter with the original ServeHTTP’s completion.