kserve: Uber Issue: KFServing admission hook causing widespread issues because its a global admission hook
/kind bug
We are getting lots of reports about problems caused because the KFServing admission hook is unavailable preventing pods from being created. The error message looks like the following
4m58s Warning FailedCreate replicaset/activator-5484756f7b Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
Here’s my understanding
-
Currently AdmissionHooks can not be scoped by label; so a pod admission hook is being applied to all pods
-
The KFServing Admission Hooks is being applied to all pods and then in the hook itself it checks whether the pod belongs to a KFServing resource and if it does applies the hook
-
However, if the KFServing web hook deployment is unavailable pod creation can be blocked
-
For a variety of reasons we are reaching into a deadlock state where
- The WebHook is defined but the deployment for the hook is not defined so calls to the admission hook will fail
- Pod creation now fails because the webhook is not defined
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 43 (18 by maintainers)
@maganaluis We need to use object selector on the mutating webhook configuration so that only kfserving labelled pods go through the KFServing pod mutator, the problem is that object selector is only supported kubernetes 1.15+ while kubeflow’s minimal requirement is still kubernetes 1.14. If you are on kubernetes 1.15+ you can use following command to solve the issue.
Possible fixes
Add the label control-plane to the kubeflow namespace
1. Change the namespaceSelector to be opt in; match namespaces with specific labelsRef: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-namespaceselector
Possible Work Arounds
control-plane
to the kubeflow namespaceA possible recipe
Get the inference spec
Change the matchSelector
Apply it
Label any namespaces in which you want to use KFServing as
@animeshsingh @jlewi We encountered this issue when I testing KFP multi-user support, I actually just re-verified this with the latest changes on the manifests repo. The problem first comes when
control-plane
is enabled on the kubeflow namespace, this prevents the istio sidecar injection from working on that namespace, so in order to have mutli-user support for KFP we removed the label. I didn’t investigate any further why this happens, but I’d love to get some documentation on what KFServing is doing on those webooks.The second issue is this deadlock outlined above, this happened when I deleted the kubeflow resources, then attempted to reinstall kubeflow (with istio already installed) this caused the widespread issue that prevented any pods from being created.
To avoid the deadlock and to have the sidecar injection on the kubeflow namespace, I had to re-apply the profile with istioctl (we are using 1.6), create the kubeflow namespace without the
control-plane
enabled and then proceed to install kubeflow, kfserving, knative etc…It’s quite strange but I suspect you guys or other will run into this issues so I wanted to post this information here.
@lluunn yes.
#571 and https://github.com/kubeflow/kubeflow/pull/4533 is what you need
So this is related to https://github.com/kubeflow/kfserving/issues/480. As long as the kfserving controller is available then things work, even though it is looking at every Pod. The way that all Pod submissions can fail is if the controller itself isn’t available. When kubernetes tries to bring it back then the hook fires but the controller isn’t there to implement the hook so you get a catch-22.