kubernetes: TestValidateOnlyStatus flakes with `apiVersion: Invalid value`

https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-integration-master/1135464872302088192

--- FAIL: TestValidateOnlyStatus (1.61s)
I0102 19:03:43.523081  125200 establishing_controller.go:84] Shutting down EstablishingController
I0102 19:03:43.523219  125200 secure_serving.go:156] Stopped listening on 127.0.0.1:46393
I0102 19:03:43.523225  125200 naming_controller.go:295] Shutting down NamingConditionController
I0102 19:03:43.523093  125200 crd_finalizer.go:254] Shutting down CRDFinalizer
I0102 19:03:43.523081  125200 customresource_discovery_controller.go:214] Shutting down DiscoveryController
testserver.go:141: runtime-config=map[api/all:true]
testserver.go:142: Starting apiextensions-apiserver on port 36085...
testserver.go:160: Waiting for /healthz to be ok...
subresources_test.go:531: unexpected error: WishIHadChosenNoxu.mygroup.example.com "foo" is invalid: apiVersion: Invalid value: "mygroup.example.com/v1beta1": must be mygroup.example.com/v1

cc @mbohlool @sttts

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

found the root cause… the etcd store code that creates the new object to decode into was not propagating the desired hub group version kind:

https://github.com/kubernetes/kubernetes/blob/714fcd910fb7abf8647a6f67bfb15eeed10854a1/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L687-L690

That meant that when reading from etcd, the “old object” was always kept in whatever version it was read from storage in, rather than being converted to the in-memory hub version.

This was masked by several things:

  • In almost all updates, the “current object” comes from the last-observed update in the REST version held by the watch cache, and is not read from etcd. That is why this test only rarely flaked, but failed 100% of the time with the watch cache disabled
  • For PUT and POST requests to the primary resource (not a subresource), the apiVersion of the old object never hits validation… only in patch requests can the fields from the oldObject bleed through into validation
  • For all requests to the status subresource of a custom resource, all fields from the old object except “status” are preserved. That’s why this was the test that flaked when we happened to get an unconverted object from etcd rather than the watch cache.

Fixed the issue in https://github.com/kubernetes/kubernetes/pull/78713 and added tests to exercise the code path to ensure the old object has the correct GVK with and without the watch cache enabled

figured it out… statusREST is not setting the expected kind in the New() func, so conversion doesn’t know a different in-memory version is expected.