kubernetes: Suggest to make kubelet retry UnexpectedAdmissionError with an exponential delay
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug /kind feature /kind question
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others: Kuberentes: v1.11.2
When I’m testing nvidia-device-plugin, and trigger it return error: invalid allocation request: unknown device
https://github.com/NVIDIA/k8s-device-plugin/blob/v1.11/server.go#L165
It retries continuously because it’s UnexpectedAdmissionError.
kubectl get po | grep UnexpectedAdmissionError | wc -l
1181
I suggest to make it retry with an with an exponential delay(10s, 20s, 40s …).
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 26 (8 by maintainers)
I’d like to second this bug report. Currently we are developing a device plugin, and the behavior surrounding UnexpectedAdmissionError is very undesirable.
We have worked around it by never returning errors from DevicePluginServer.Allocate(). Instead we return a “poison” value in ContainerAllocateResponse.Envs that causes the container creation to fail, which K8S handles in a much more amenable way. This is an ugly hack though.
same bugs report with version v1.14.3