kubeflow: MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found

/kind bug

What steps did you take and what happened: After kubeflow v1.1.0 is installed, none of the below workloads creates. It fails with an error “ReplicaSet “metadata-writer-694c48ccdc” has timed out progressing.; Deployment does not have minimum availability.”.

In the pod events, I see the below error. This is the same errors on all the pods under the mentioned namespace workloads below

Warning	FailedMount	MountVolume.SetUp failed for volume “webhook-tls-certs” : secret “webhook-server-tls” not found	a minute ago
Warning	FailedMount	Unable to mount volumes for pod “cache-server-65596854d-9r77s_kubeflow(645871c2-acb6-4a72-a0ee-e1c276a639e3)”: timeout expired waiting for volumes to attach or mount for pod “kubeflow”/“cache-server-65596854d-9r77s”. list of unmounted volumes=[webhook-tls-certs]. list of unattached volumes=[webhook-tls-certs kubeflow-pipelines-cache-token-55pr6 istio-envoy sds-uds-path istio-token]	6 minutes ago
Warning	FailedMount	Unable to mount volumes for pod “cache-server-65596854d-9r77s_kubeflow(645871c2-acb6-4a72-a0ee-e1c276a639e3)”: timeout expired waiting for volumes to attach or mount for pod “kubeflow”/“cache-server-65596854d-9r77s”. list of unmounted volumes=[webhook-tls-certs istio-token]. list of unattached volumes=[webhook-tls-certs kubeflow-pipelines-cache-token-55pr6 istio-envoy sds-uds-path istio-token]	an hour ago

Namespace: kubeflow cache-server cache-deployer-deployment

What did you expect to happen: Pods to get initialized and become active

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): v1.1.0 tried with both with DEX and without
kfctl version: (use kfctl version): kfctl v1.1.0-0-g9a3621e
Kubernetes platform: (e.g. minikube)
Kubernetes version: (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.12”, GitCommit:“e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725”, GitTreeState:“clean”, BuildDate:“2020-05-06T05:09:48Z”, GoVersion:“go1.12.17”, Compiler:“gc”, Platform:“linux/amd64”}
OS (e.g. from /etc/os-release): CentOS 7.8

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 10
Comments: 37 (14 by maintainers)

Most upvoted comments

@ShilpaGopal I’m not sure what the difference is between single-user and multi-user in regards to the cache deployer, but to me it does sound like the issue you are having is due to the cluster signer. I have faced this issue with both kubeflow 1.1 and kubeflow 1.2 with an on-prem deployment.

You could try setting --cluster-signing-cert-file and --cluster-signing-key-file for the kube-controller-manager, then remove the cache-deployer and cache-server deployments, then run kfctl apply -V -f <your-kdef>. Do note that any changes you have made to your configuration that are not in the kustomize folder will be removed when doing this. Also, messing with the cluster signing cert and key file might break other things in the cluster so try at your own risk.

This works for me, kubeflow 1.2.0, rancher 1.9.4. After configed kube-controller-manager like this comment https://github.com/rancher/rancher/issues/14674#issuecomment-535234849, I deleted kfdef and recreated it. Thanks a lot @ShilpaGopal

dontmint on Jan 11, 2021

@ShilpaGopal I’m not sure what the difference is between single-user and multi-user in regards to the cache deployer, but to me it does sound like the issue you are having is due to the cluster signer. I have faced this issue with both kubeflow 1.1 and kubeflow 1.2 with an on-prem deployment.

You could try setting --cluster-signing-cert-file and --cluster-signing-key-file for the kube-controller-manager, then remove the cache-deployer and cache-server deployments, then run kfctl apply -V -f <your-kdef>. Do note that any changes you have made to your configuration that are not in the kustomize folder will be removed when doing this. Also, messing with the cluster signing cert and key file might break other things in the cluster so try at your own risk.

davidspek on Dec 20, 2020

I had set up Kubeflow in Onprem cluster without AUTH, where everything was working fine. Now I added OIDC and redeployed kubeflow, Cache server is not coming up for the same reason. Cache deployer fails with

$ k logs -f pod/cache-deployer-deployment-6f7b78cb7c-6q4cs -n kubeflow main
Start deploying cache service to existing cluster:
+ echo 'Start deploying cache service to existing cluster:'
+ NAMESPACE=kubeflow
+ MUTATING_WEBHOOK_CONFIGURATION_NAME=cache-webhook-kubeflow
+ WEBHOOK_SECRET_NAME=webhook-server-tls
+ kubectl get mutatingwebhookconfigurations cache-webhook-kubeflow --namespace kubeflow --ignore-not-found
The connection to the server 10.96.0.1:443 was refused - did you specify the right host or port?

I don’t see any mutatingwebhookconfigurations by name cache-webhook-kubeflow in the cluster. Any input on this is appreciated

ShilpaGopal on Dec 20, 2020