kubernetes: Running conformance test on 1.29.0-alpha.3 cluster with sonobuoy fails

What happened?

@rtheis reported that conformance testing with the latest version of sonobuoy against a version 1.29.0-alpha.3 cluster fails on the support the 1.17 Sample API Server using the current Aggregator test: https://github.com/kubernetes/kubernetes/pull/121283#issuecomment-1812399851, related to https://github.com/kubernetes/kubernetes/pull/121283.

I found the reason why the test is working fine in Kubernetes CI jobs but fails with sonobuoy is because the service account used by sonobuoy “sonobuoy-serviceaccount” doesn’t have read permission of the non-resource url “/”, which is used by the test to confirm the removal of group path: https://github.com/kubernetes/kubernetes/blob/ec5096fa869b801d6eb1bf019819287ca61edc4d/test/e2e/apimachinery/aggregator.go#L744-L746

I1121 14:31:40.149085       1 rbac.go:119] RBAC: no rules authorize user "system:serviceaccount:sonobuoy:sonobuoy-serviceaccount" with groups ["system:serviceaccounts" "system:serviceaccounts:son
thenticated"] to "get" nonResourceURL "/" cluster-wide
I1121 14:31:40.149099       1 authorization.go:87] "Forbidden" URI="/" reason=""
I1121 14:31:40.149218       1 httplog.go:132] "HTTP" verb="GET" URI="/" latency="327.429µs" userAgent="e2e.test/v1.29.0 (linux/amd64) kubernetes/1f69e12 -- [sig-api-machinery] Aggregator Should b
 the 1.17 Sample API Server using the current Aggregator [Conformance]" audit-ID="46c27282-7af6-48d1-bf3e-ca439c57bc92" srcIP="172.18.0.4:36878" apf_pl="workload-low" apf_fs="service-accounts" ap
eats=0 apf_additionalLatency="0s" apf_execution_time="105.94µs" resp=403

The permission of “sonobuoy-serviceaccount” is as below: https://github.com/vmware-tanzu/sonobuoy/blob/6f9e27f1795f10475c9f6f5decdff692e1e228da/pkg/client/gen.go#L502-L505

cr.Rules = []v1.PolicyRule{
	{
		APIGroups: []string{"*"},
		Resources: []string{"*"},
		Verbs:     []string{"*"},
	},
	{
		NonResourceURLs: []string{"/metrics", "/logs", "/logs/*"},
		Verbs:           []string{"get"},
	},
}

There may be two ways to fix it:

  1. Update the RBAC rule of sonobuoy to include “get” of “/”. However, could other test runners encounter this issue? Is there a doc listing the required permissions for running e2e tests?
  2. Update the e2e to skip checking the removal of the group path if the service account doesn’t have permission to access the root path.

Which way do we usually prefer?

What did you expect to happen?

Running e2e with sonobuoy should succeed.

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy a v1.29.0-alpha.3 cluster.
  2. Run sonobuoy run --e2e-focus "Sample API Server using the current Aggregator"

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.0-alpha.3", GitCommit:"1f69e121482db6664e4a1c1d21ec4dcf2b36b080", GitTreeState:"clean", BuildDate:"2023-11-02T18:24:31Z", Go
", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 21 (20 by maintainers)

Most upvoted comments

it looks like the e2e change made in https://github.com/kubernetes/kubernetes/pull/121283 was not consciously made part of conformance, it was just trying to functionally test the bugfix being made.

For 1.29, I think we should revert the change to test/e2e/apimachinery/aggregator.go and make an integration test that validates the fix instead. We can revisit whether content of / is part of conformance or not for 1.30

@neolit123 +1 for what @BenTheElder is saying, that we should expect that the user running the tests has required permissions. These tests are not built for being run unprivileged.

We probably need another issue to track 1.30 follow-up?

@BenTheElder If no one has done it, I could create another issue and follow up with the items that have been discussed here:

  1. Update outdated privilege requirement in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md.
  2. Make commonly used conformance tools like sonobuoy aware of the required priviledge to avoid similar situations in the future.
  3. Add permission of GET “/” to authorized users via the system:discovery role for root discovery.

Would have expected it here:

{
	// a role which provides just enough power to determine if the server is
	// ready and discover API versions for negotiation
	ObjectMeta: metav1.ObjectMeta{Name: "system:discovery"},
	Rules: []rbacv1.PolicyRule{
		rbacv1helpers.NewRule("get").URLs(
			"/livez", "/readyz", "/healthz",
			"/version", "/version/",
			"/openapi", "/openapi/*",
			"/api", "/api/*",
			"/apis", "/apis/*",
		).RuleOrDie(),
	},
},

The PR in question explicitly fixes the result returned by /. If that is part of conformance (IMO it should), every conformance test needs access it obviously. Whether we require admin, or we add it manually, either would work.

In other words, just add the / permission. While thinking about it, it is surprising that the user in question does not have this access already. Don’t we allow that for every authorized user?

looks like we don’t.

$ kubectl get clusterrole cluster-admin -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2023-11-24T18:34:38Z"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: cluster-admin
  resourceVersion: "1206"
  uid: 13836a28-6c0d-4008-b566-bb528011e6e0
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - /api # <------------- modified from the default of '*'
  verbs:
  - '*'

$ kubectl get --raw /
Error from server (Forbidden): forbidden: User "kubernetes-admin" cannot get path "/"

$ kubectl get --raw /api
{"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"10.0.2.15:6443"}]}

In contrast, I hope that lots of privileged operations are tested by conformance today. If we don’t test things like admission webhooks for example, we have much bigger problems. And admission webhooks are basically equivalent to cluster-admin.

before this ticket, i was under the impression that the conformance suite already requires admin and that all testers (like Sonobuoy) use admin level credentials.

it is non-privileged (e.g., does not require root on nodes, access to raw network interfaces, or cluster admin permissions)

that is outdated, we use hostNetwork pods one some of the tests IIRC

looks like we may have to change that sentence completely.

IMHO this is a bug in running the tests via a pod, which is not how we develop the tests (they’re run outside the cluster, by a cluster admin account typically).

I would say 1) is the correct answer. xref: #121986 (comment)

@BenTheElder thanks for your comment. I guess 1) means non of the existing sonobuoy versions can pass K8s conformance test against a cluster > 1.29.0-alpha3, as the RBAC rule is hardcoded. I will file an issue to sonobuoy first.