kubeflow: MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found
/kind bug
What steps did you take and what happened: After kubeflow v1.1.0 is installed, none of the below workloads creates. It fails with an error “ReplicaSet “metadata-writer-694c48ccdc” has timed out progressing.; Deployment does not have minimum availability.”.
In the pod events, I see the below error. This is the same errors on all the pods under the mentioned namespace workloads below
Warning | FailedMount | MountVolume.SetUp failed for volume “webhook-tls-certs” : secret “webhook-server-tls” not found | a minute ago |
---|---|---|---|
Warning | FailedMount | Unable to mount volumes for pod “cache-server-65596854d-9r77s_kubeflow(645871c2-acb6-4a72-a0ee-e1c276a639e3)”: timeout expired waiting for volumes to attach or mount for pod “kubeflow”/“cache-server-65596854d-9r77s”. list of unmounted volumes=[webhook-tls-certs]. list of unattached volumes=[webhook-tls-certs kubeflow-pipelines-cache-token-55pr6 istio-envoy sds-uds-path istio-token] | 6 minutes ago |
Warning | FailedMount | Unable to mount volumes for pod “cache-server-65596854d-9r77s_kubeflow(645871c2-acb6-4a72-a0ee-e1c276a639e3)”: timeout expired waiting for volumes to attach or mount for pod “kubeflow”/“cache-server-65596854d-9r77s”. list of unmounted volumes=[webhook-tls-certs istio-token]. list of unattached volumes=[webhook-tls-certs kubeflow-pipelines-cache-token-55pr6 istio-envoy sds-uds-path istio-token] | an hour ago |
Namespace: kubeflow cache-server cache-deployer-deployment
What did you expect to happen: Pods to get initialized and become active
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
-
Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): v1.1.0 tried with both with DEX and without
-
kfctl version: (use
kfctl version
): kfctl v1.1.0-0-g9a3621e -
Kubernetes platform: (e.g.
minikube
) -
Kubernetes version: (use
kubectl version
): Client Version: version.Info{Major:“1”, Minor:“19”, GitVersion:“v1.19.0”, GitCommit:“e19964183377d0ec2052d1f1fa930c4d7575bd50”, GitTreeState:“clean”, BuildDate:“2020-08-26T14:30:33Z”, GoVersion:“go1.15”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.12”, GitCommit:“e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725”, GitTreeState:“clean”, BuildDate:“2020-05-06T05:09:48Z”, GoVersion:“go1.12.17”, Compiler:“gc”, Platform:“linux/amd64”} -
OS (e.g. from
/etc/os-release
): CentOS 7.8
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 10
- Comments: 37 (14 by maintainers)
This works for me, kubeflow 1.2.0, rancher 1.9.4. After configed
kube-controller-manager
like this comment https://github.com/rancher/rancher/issues/14674#issuecomment-535234849, I deleted kfdef and recreated it. Thanks a lot @ShilpaGopal@ShilpaGopal I’m not sure what the difference is between single-user and multi-user in regards to the cache deployer, but to me it does sound like the issue you are having is due to the cluster signer. I have faced this issue with both kubeflow 1.1 and kubeflow 1.2 with an on-prem deployment.
You could try setting
--cluster-signing-cert-file
and--cluster-signing-key-file
for the kube-controller-manager, then remove the cache-deployer and cache-server deployments, then runkfctl apply -V -f <your-kdef>
. Do note that any changes you have made to your configuration that are not in the kustomize folder will be removed when doing this. Also, messing with the cluster signing cert and key file might break other things in the cluster so try at your own risk.I had set up Kubeflow in Onprem cluster without AUTH, where everything was working fine. Now I added OIDC and redeployed kubeflow, Cache server is not coming up for the same reason. Cache deployer fails with
I don’t see any mutatingwebhookconfigurations by name cache-webhook-kubeflow in the cluster. Any input on this is appreciated