kubernetes: APF improperly estimates cost for LIST aggregated calls

What happened?

I started API call like:

2022-03-17 08:01:17.003 PDT
""HTTP" verb="LIST" URI="/apis/external.metrics.k8s.io/v1beta1/namespaces/ns-1/custom.googleapis.com%7Ctarget?labelSelector=resource.labels.namespace_id+in+%28default%2Cdummy%29%2Cresource.labels.pod_id%2Cresource.type%3Dgke_container"

Based on kube-apiserver logs I think that this call has taken maximumSeats for entire duration of the call (which was 10s due to reentrant call made by service handling this call and using the same priority-level).

In my opinion we shouldn’t use maximumSeats in case of delegated calls as they do not consume much kube-apiserver resources + (practically speaking) can block for quite a long time for external reasons blocking significant part of priority level.

What happens:

Such calls are not longrunning according to definition in https://github.com/kubernetes/kubernetes/blob/4348c8ecaf87d91503718a42a930c397c3c82569/cmd/kube-apiserver/app/server.go#L387-L390 so they are being a subject of throttling in priority and fairness
we hit ObjectCountNotFoundErr as we don’t know how many objects of type like “external.metrics.k8s.io/v1beta1” we have (which is expected), https://github.com/kubernetes/kubernetes/blob/4348c8ecaf87d91503718a42a930c397c3c82569/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go#L70-L70 and return maximumSeats

What did you expect to happen?

InitialSeats for LIST calls to external API services is lower than maximumSeats (aka 10). Something like 1 is more expected as kube-apiserver is only redirecting traffic which involves limited CPU and memory, compared to other LIST calls.
LIST calls to external API services are not handled by “error handling path”, rather there is some explicit check for them

How can we reproduce it (as minimally and precisely as possible)?

From https://github.com/kubernetes/kubernetes/issues/108524 which describes some observed problem caused by this:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/triton-batch/s0-prometheus-gpu_utilization?labelSelector=scaledobject.keda.sh%2Fname%3Dtriton"

The second call will take 10 seats.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

kubernetes: APF improperly estimates cost for LIST aggregated calls

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

Most upvoted comments