kubernetes: absence of openapi configuration in integration tests makes server-side apply panic (broke apf controller when it switched to SSA)

What happened?

Integration test timed out due to the following error

E0120 14:05:02.398256  118058 runtime.go:77] Observed a panic: FieldManager must be installed to run apply

The error appeared 748 times in the log. Looks like lots of churning happens during the apf configuration bootstrapping phase. Is there a race condition between when the type is being registered and the API calls?

Stack trace:

I0120 14:05:02.397911  118058 apf_controller.go:879] Triggered API priority and fairness config reloading because priority level exempt is undesired and idle
I0120 14:05:02.398135  118058 panic.go:1038] "HTTP" verb="APPLY" URI="/apis/flowcontrol.apiserver.k8s.io/v1beta2/flowschemas/system-node-high/status?fieldManager=api-priority-and-fairness-config-consumer-v1&force=true" latency="650.951µs" userAgent="Go-http-client/1.1" audit-ID="bc3c561e-1a80-403e-b611-63ba7bd484b9" srcIP="127.0.0.1:46572" apf_pl="exempt" apf_fs="exempt" apf_fd="" resp=0
E0120 14:05:02.398256  118058 runtime.go:77] Observed a panic: FieldManager must be installed to run apply
goroutine 136765 [running]:
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1.1()
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher/finisher.go:105 +0xaf
panic({0x3fc0bc0, 0x542eb00})
	/usr/local/go/src/runtime/panic.go:1038 +0x215
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*applyPatcher).applyPatchToCurrentObject(0x551ceb0, {0x553da98, 0xc0887ccc00}, {0x551ceb0, 0xc0887da480})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:482 +0x449
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*patcher).applyPatch(0xc0bf781a40, {0x553da98, 0xc0887ccc00}, {0x4c296a0, 0xc0887da480}, {0x551ceb0, 0xc0887da480})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:566 +0xd3
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/rest.(*defaultUpdatedObjectInfo).UpdatedObject(0x81f4878, {0x553da98, 0xc0887ccc00}, {0x551ceb0, 0xc0887da480})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/rest/update.go:229 +0xd0
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).Update.func1({0x551ceb0, 0xc0887da480}, {0xc07278e397b5dfde, 0x8b5f27eafb})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go:533 +0x1f9
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/etcd3.(*store).updateState(0xc03d204bd0, 0xc06ba15310, 0x42)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/etcd3/store.go:894 +0x3e
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/etcd3.(*store).GuaranteedUpdate(0xc03d204bd0, {0x553da98, 0xc0887ccc00}, {0xc01908f2e0, 0x1d}, {0x551ceb0, 0xc0887da300}, 0x1, 0x40ef94, 0xc0848d34a0, ...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/etcd3/store.go:365 +0x56e
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.(*Cacher).GuaranteedUpdate(0xc0e9754c60, {0x553da98, 0xc0887ccc00}, {0xc01908f2e0, 0x1d}, {0x551ceb0, 0xc0887da300}, 0xa8, 0x4b86ae0, 0xc0848d34a0, ...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go:721 +0x1b5
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*DryRunnableStorage).GuaranteedUpdate(0x4cbc218, {0x553da98, 0xc0887ccc00}, {0xc01908f2e0, 0x4cac0b0}, {0x551ceb0, 0xc0887da300}, 0xf2, 0x1, 0xc0848d34a0, ...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry/dryrun.go:97 +0x1c7
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).Update(0xc0e5c5b2c0, {0x553da98, 0xc0887ccc00}, {0xc0412d2b3d, 0xb}, {0x5523328, 0xc0887cccf0}, 0xc0105369b0, 0x4ea4ac8, 0x0, ...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go:521 +0x508
k8s.io/kubernetes/pkg/registry/flowcontrol/flowschema/storage.(*StatusREST).Update(0xc0887ccd50, {0x553da98, 0xc0887ccc00}, {0xc0412d2b3d, 0xc086dd5bf0}, {0x5523328, 0xc0887cccf0}, 0xc0991a2748, 0x40ef94, 0x1, ...)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/registry/flowcontrol/flowschema/storage/storage.go:93 +0x52
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*patcher).patchResource.func2()
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:665 +0xa7
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*patcher).patchResource.func3()
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/patch.go:671 +0x38
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1()
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher/finisher.go:117 +0x8f
created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/finisher/finisher.go:92 +0xe5

Integration job link: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/107456/pull-kubernetes-integration/1484156493719670784 PR: https://github.com/kubernetes/kubernetes/pull/107456

I searched for this error in CI and this PR, looks like we only see this error in this PR so far. So I am hoping it has not introduced any flake. (maybe we need to let more time to pass to start seeing flakes)

What did you expect to happen?

We should not see this error FieldManager must be installed to run apply

How can we reproduce it (as minimally and precisely as possible)?

I saw it only in the PR mentioned.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 47 (45 by maintainers)

Most upvoted comments

Taking a fresh look at this PR and wondering if it has been mostly fixed already?

From @liggitt’s comments my understanding is that this panic occurred due to two conditions:

  1. kube-apiserver running as part of the integration test suite is asked to service an SSA request while OpenAPIModels are nil. The apiservice has logic to panic in this case. From the code path @liggitt commented OpenAPIModels is nil when OpenAPIConfig the server is created with is nil.
  2. An APF worker (running alongside an integration test) issues an SSA request to the test apiserver. Since the apiserver is misconfigured it panics. This panic could happen sporadically because sometimes the tests finish before the APF worker is able to issue the request.

As a brief check to see if this might be fixed, I ran the test TestUnschedulableNodeDaemonDoesLaunchPod @Jefftree mentioned was failing from SSA. I logged out the Node’s created and found that they all had managed fields. I saw this as an indication SSA was now functioning properly.

Indeed, to start the test server the setup function calls kubeapiservertesting.StartTestServerOrDie. This function has a code path which unconditionally leads to the apiserver’s OpenAPIConfig being set.

StartTestServer calls CreateServerChain before running the returned apiserver:

https://github.com/kubernetes/kubernetes/blob/6f706775bcb0007082ca940527a154e728b4399f/cmd/kube-apiserver/app/testing/testserver.go#L212-L231

CreateServerChain calls CreateKubeAPIServerConfig before using the returned config to create the returned apiserver:

https://github.com/kubernetes/kubernetes/blob/0527a0dd453c4b76259389ec8e8e6888c5e2a5ab/cmd/kube-apiserver/app/server.go#L176-L195

CreateKubeAPIServerConfig uses buildGenericConfig which sets OpenAPIConfig:

https://github.com/kubernetes/kubernetes/blob/0527a0dd453c4b76259389ec8e8e6888c5e2a5ab/cmd/kube-apiserver/app/server.go#L237-L248

https://github.com/kubernetes/kubernetes/blob/0527a0dd453c4b76259389ec8e8e6888c5e2a5ab/cmd/kube-apiserver/app/server.go#L388-L396

Thus, any test using kubeapiservertesting.StartTestServerOrDie should no longer be affected by this panic. The question then becomes: do all integration tests start their servers in this way?

It’s hard to say. I found a PR (#110529, ) which refactors many occurrences of the old stanza:

controlPlaneConfig := framework.NewIntegrationTestControlPlaneConfig()
 _, server, closeFn := framework.RunAnAPIServer(controlPlaneConfig)

Into something using the new method:

server := kubeapiservertesting.StartTestServerOrDie(t, nil, nil, framework.SharedEtcd())
defer server.TearDownFn()

But it appears RunAnAPIServer is still being used in a few integration tests:

Should these occurrences be refactored to use kubeapiservertesting.StartTestServerOrDie too?