kubernetes: APF improperly estimates cost for LIST aggregated calls
What happened?
I started API call like:
2022-03-17 08:01:17.003 PDT
""HTTP" verb="LIST" URI="/apis/external.metrics.k8s.io/v1beta1/namespaces/ns-1/custom.googleapis.com%7Ctarget?labelSelector=resource.labels.namespace_id+in+%28default%2Cdummy%29%2Cresource.labels.pod_id%2Cresource.type%3Dgke_container"
Based on kube-apiserver logs I think that this call has taken maximumSeats for entire duration of the call (which was 10s due to reentrant call made by service handling this call and using the same priority-level).
In my opinion we shouldn’t use maximumSeats in case of delegated calls as they do not consume much kube-apiserver resources + (practically speaking) can block for quite a long time for external reasons blocking significant part of priority level.
What happens:
- Such calls are not longrunning according to definition in https://github.com/kubernetes/kubernetes/blob/4348c8ecaf87d91503718a42a930c397c3c82569/cmd/kube-apiserver/app/server.go#L387-L390 so they are being a subject of throttling in priority and fairness
- we hit ObjectCountNotFoundErr as we don’t know how many objects of type like “external.metrics.k8s.io/v1beta1” we have (which is expected), https://github.com/kubernetes/kubernetes/blob/4348c8ecaf87d91503718a42a930c397c3c82569/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/request/list_work_estimator.go#L70-L70 and return maximumSeats
What did you expect to happen?
- InitialSeats for LIST calls to external API services is lower than maximumSeats(aka 10). Something like 1 is more expected as kube-apiserver is only redirecting traffic which involves limited CPU and memory, compared to other LIST calls.
- LIST calls to external API services are not handled by “error handling path”, rather there is some explicit check for them
How can we reproduce it (as minimally and precisely as possible)?
From https://github.com/kubernetes/kubernetes/issues/108524 which describes some observed problem caused by this:
- kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
- kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/triton-batch/s0-prometheus-gpu_utilization?labelSelector=scaledobject.keda.sh%2Fname%3Dtriton"
The second call will take 10 seats.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (18 by maintainers)
mitigation merged to 1.24, ready to pick to 1.23 /milestone v1.23