kubernetes: API server panics when writing response
When I was running large scale kubemark test (2000 Nodes with 95 pods each) my API server crashed with the error:
I0715 12:14:23.387809 3511 trace.go:61] Trace "List /api/v1/namespaces/e2e-tests-density-idftq/pods" (started 2016-07-15 12:13:01.712298253 +0000 UTC):
[102.412µs] [102.412µs] About to List from storage
[2.53469971s] [2.534597298s] Listing from storage done
[2.954350982s] [419.651272ms] Self-linking done
[1m21.675466193s] [1m18.721115211s] Writing http response done (130003 items)
[1m21.675467856s] [1.663µs] END
E0715 12:14:23.393199 3511 apiserver.go:210] recover from panic situation: - Header called after Handler finished^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:206^M
/usr/local/go/src/runtime/asm_amd64.s:472^M
/usr/local/go/src/runtime/panic.go:443^M
/usr/local/go/src/net/http/h2_bundle.go:4452^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238^M
<autogenerated>:8^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:102^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:272^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:120^M
/usr/local/go/src/net/http/server.go:1618^M
/usr/local/go/src/net/http/server.go:1910^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:485^M
/usr/local/go/src/net/http/server.go:1618^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:420^M
/usr/local/go/src/net/http/server.go:1618^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/auth/handlers/handlers.go:66^M
/usr/local/go/src/net/http/server.go:1618^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101^M
/usr/local/go/src/net/http/server.go:1618^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101^M
/usr/local/go/src/net/http/server.go:1618^M
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:194^M
/usr/local/go/src/runtime/asm_amd64.s:1998^M
I0715 12:14:23.387986 3511 handlers.go:164] GET /api/v1/nodes/10.180.12.88: (144.600059ms) 200 [[kubemark/v1.4.0 (linux/amd64) kubernetes/12483d9] 146.148.109.243:33217]
I0715 12:14:23.388150 3511 handlers.go:164] GET /api/v1/nodes/10.180.1.92: (154.464521ms) 200 [[kubemark/v1.4.0 (linux/amd64) kubernetes/12483d9] 104.197.214.41:58039]
panic: Header called after Handler finished [recovered]
panic: Header called after Handler finished
goroutine 54829830 [running]:
panic(0x20f0220, 0xc821e07010)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
net/http.(*http2responseWriter).Header(0xc956288948, 0x7fe725a76c88)
/usr/local/go/src/net/http/h2_bundle.go:4452 +0x78
k8s.io/kubernetes/pkg/httplog.(*respLogger).Header(0xc9702bd260, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*baseTimeoutWriter).Header(0xc9c39fb9e0, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238 +0x39
k8s.io/kubernetes/pkg/apiserver.logStackOnRecover(0x7fe725d5b8a8, 0xc82014ad90, 0x20f0220, 0xc8f69cd900, 0x7fe725b36be8, 0xc8fc256620)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/apiserver.go:213 +0x2d9
k8s.io/kubernetes/pkg/apiserver.InstallRecoverHandler.func1(0x20f0220, 0xc8f69cd900, 0x7fe725b36be8, 0xc8fc256620)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/apiserver.go:195 +0x5e
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch.func2(0xc82032cea0, 0xc83a235710)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:206 +0x6a
panic(0x20f0220, 0xc8f69cd900)
/usr/local/go/src/runtime/panic.go:443 +0x4e9
net/http.(*http2responseWriter).Header(0xc956288948, 0x2)
/usr/local/go/src/net/http/h2_bundle.go:4452 +0x78
k8s.io/kubernetes/pkg/httplog.(*respLogger).Header(0xc9702bd260, 0x2decc58)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:170 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*baseTimeoutWriter).Header(0xc9c39fb9e0, 0x0)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:238 +0x39
k8s.io/kubernetes/pkg/apiserver/metrics.(*fancyResponseWriterDelegator).Header(0xc8fc256630, 0xc8bc4a8e10)
<autogenerated>:8 +0x53
k8s.io/kubernetes/pkg/apiserver/metrics.InstrumentRouteFunc.func1(0xc8bee93a10, 0xc90d296780)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/metrics/metrics.go:102 +0x35c
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc82032cea0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:272 +0xf30
k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).(k8s.io/kubernetes/vendor/github.com/emicklei/go-restful.dispatch)-fm(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:120 +0x3e
net/http.HandlerFunc.ServeHTTP(0xc820ceb980, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
net/http.(*ServeMux).ServeHTTP(0xc8202c07b0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1910 +0x17d
k8s.io/kubernetes/pkg/apiserver.WithAuthorizationCheck.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:485 +0xf4
net/http.HandlerFunc.ServeHTTP(0xc820313ac0, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.WithImpersonation.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:420 +0xfe
net/http.HandlerFunc.ServeHTTP(0xc820313b00, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/auth/handlers.NewRequestAuthenticator.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/auth/handlers/handlers.go:66 +0x2c9
net/http.HandlerFunc.ServeHTTP(0xc8203f5270, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820522660, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/api.NewRequestContextFilter.func1(0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/api/requestcontext.go:101 +0x157
net/http.HandlerFunc.ServeHTTP(0xc820522680, 0x7fe725b36be8, 0xc8fc256620, 0xc83c513260)
/usr/local/go/src/net/http/server.go:1618 +0x3a
k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP.func1(0xc8216ec9c0, 0x7fe725b36ba8, 0xc8fc256620, 0xc83c513260, 0xc90d296720)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:194 +0x91
created by k8s.io/kubernetes/pkg/apiserver.(*timeoutHandler).ServeHTTP
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:196 +0x15b
It happened around 110k-120k Pods in the cluster.
It looks like a bug in http2 handling. cc @timothysc @smarterclayton @wojtek-t @lavalamp @kubernetes/sig-api-machinery
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 50 (45 by maintainers)
Commits related to this issue
- Merge pull request #29032 from timothysc/http2_godep_update Automatic merge from submit-queue Update golang.org/x/net godep for http2 issues being seen during scale This is to address https://githu... — committed to kubernetes/kubernetes by k8s-github-robot 8 years ago
- Merge pull request #29283 from timothysc/http2_default_disable Automatic merge from submit-queue Revert "Follow on for 1.4 to default HTTP2 on by default" This reverts commit efe25553cdb28c8ce69e8e... — committed to kubernetes/kubernetes by k8s-github-robot 8 years ago
- Merge pull request #29594 from xiang90/fix_panic Automatic merge from submit-queue apiserver: fix timeout handler Protect access of the original writer. Panics if anything has wrote into the origi... — committed to kubernetes/kubernetes by deleted user 8 years ago
- Merge pull request #32231 from timothysc/re-re-http2 Automatic merge from submit-queue Default HTTP2 on, post fixes from #29001 This reverts commit 8cb799c7897c029863cd9051b8e033991d29fe88. Enab... — committed to kubernetes/kubernetes by deleted user 8 years ago
Hi, I own Go’s net/http and http2 stuff.
I saw https://github.com/golang/go/issues/16451 but it appears that the bug is actually in Kubernetes.
When you get the error message:
“panic: Header called after Handler finished”
That means the ResponseWriter.Header() method was called after the ServeHTTP completed.
So, I went looking up your panic’s call stack looking for suspects.
My first guess was
fancyResponseWriterDelegator,… because anything named fancy is just asking for trouble. But it seems fine, despite having no docs on why it’s fancy.
But working up the stack more, I see things with “timeout” in the name, which makes me think there are goroutines involved running ServeHTTP async. And sure enough:
k8s.io/kubernetes/pkg/apiserver/handlers.go:
… good, you’re passing a custom ResponseWriter to the new ServeHTTP started in the goroutine.
So I’d expect
newTimeoutWriterto handle not calling the underlyingHeadermethod once the original ServeHTTP was done.But:
The baseTimeOutWriter’s Header method doesn’t have any synchronization. It’s always accessing tw.w (which is the original ServeHTTP’s goroutine’s ResponseWriter), even the timeoutHandler.ServeHTTP has returned.
But https://golang.org/pkg/net/http/#ResponseWriter says:
So, a data race (as is the case in HTTP/1 I imagine) or a panic (with HTTP/2) are both possible.
Arguably the HTTP/2 behavior is better because it told you there’s a problem rather than having a silent data race which you might not notice unless your tests run under the race detector were particularly thorough concurrency/timeout-wise.
You’ll need to need to make your timeoutHandler and its timeoutWriter a bit more aware of each other and synchronize any method calls on the original ServerHTTP’s ResponseWriter with the original ServeHTTP’s completion.