kubernetes: Suggest to make kubelet retry UnexpectedAdmissionError with an exponential delay

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature /kind question

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others: Kuberentes: v1.11.2

When I’m testing nvidia-device-plugin, and trigger it return error: invalid allocation request: unknown device

https://github.com/NVIDIA/k8s-device-plugin/blob/v1.11/server.go#L165

It retries continuously because it’s UnexpectedAdmissionError.

kubectl get po | grep UnexpectedAdmissionError | wc -l
1181

I suggest to make it retry with an with an exponential delay(10s, 20s, 40s …).

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 7
  • Comments: 26 (8 by maintainers)

Most upvoted comments

I’d like to second this bug report. Currently we are developing a device plugin, and the behavior surrounding UnexpectedAdmissionError is very undesirable.

We have worked around it by never returning errors from DevicePluginServer.Allocate(). Instead we return a “poison” value in ContainerAllocateResponse.Envs that causes the container creation to fail, which K8S handles in a much more amenable way. This is an ugly hack though.

same bugs report with version v1.14.3