kubeflow: Several pods not starting due to various errors related to using NFS as dynamic provisioner
/kind bug
What steps did you take and what happened:
I started by creating a dynamic NFS provisioner by running
https://github.com/justmeandopensource/kubernetes/blob/master/yamls/nfs-provisioner/rbac.yaml https://github.com/justmeandopensource/kubernetes/blob/master/yamls/nfs-provisioner/default-sc.yaml https://github.com/kubernetes-incubator/external-storage/blob/master/nfs-client/deploy/deployment.yaml
which is from https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client
I then installed kubeflow with https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_istio_dex.v1.0.0.yaml
Several of the pods do not seem to be starting up due to various issues.
$ kubectl describe -n istio-system pod authservice-0
Name: authservice-0
Namespace: istio-system
Priority: 0
Node: node1.kr.example.com/10.75.38.135
Start Time: Tue, 03 Mar 2020 15:03:29 -0600
Labels: app=authservice
app.kubernetes.io/component=oidc-authservice
app.kubernetes.io/instance=oidc-authservice-v1.0.0
app.kubernetes.io/managed-by=kfctl
app.kubernetes.io/name=oidc-authservice
app.kubernetes.io/part-of=kubeflow
app.kubernetes.io/version=v1.0.0
controller-revision-hash=authservice-5f786759c5
statefulset.kubernetes.io/pod-name=authservice-0
Annotations: sidecar.istio.io/inject: false
Status: Pending
IP:
Controlled By: StatefulSet/authservice
Containers:
authservice:
Container ID:
Image: gcr.io/arrikto/kubeflow/oidc-authservice:28c59ef
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Readiness: http-get http://:8081/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
USERID_HEADER: kubeflow-userid
USERID_PREFIX:
USERID_CLAIM: email
OIDC_PROVIDER: http://dex.auth.svc.cluster.local:5556/dex
OIDC_AUTH_URL: /dex/auth
OIDC_SCOPES: profile email groups
REDIRECT_URL: /login/oidc
SKIP_AUTH_URI: /dex
PORT: 8080
CLIENT_ID: kubeflow-oidc-authservice
CLIENT_SECRET: pUBnBOY80SnXgjibTYM9ZWNzY2xreNGQok
STORE_PATH: /var/lib/authservice/data.db
Mounts:
/var/lib/authservice from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-6wg9h (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: authservice-pvc
ReadOnly: false
default-token-6wg9h:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6wg9h
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 34m (x397 over 17h) kubelet, node1.kr.example.com Unable to mount volumes for pod "authservice-0_istio-system(10199390-900a-4276-acea-b7aecdf456d7)": timeout expired waiting for volumes to attach or mount for pod "istio-system"/"authservice-0". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-6wg9h]
Warning FailedMount 4m39s (x593 over 17h) kubelet, node1.kr.example.com (combined from similar events): MountVolume.SetUp failed for volume "pvc-72502148-4c02-4f45-a9aa-4cc19d701503" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/10199390-900a-4276-acea-b7aecdf456d7/volumes/kubernetes.io~nfs/pvc-72502148-4c02-4f45-a9aa-4cc19d701503 --scope -- mount -t nfs dell-ds1.example.com:/k8/istio-system-authservice-pvc-pvc-72502148-4c02-4f45-a9aa-4cc19d701503 /var/lib/kubelet/pods/10199390-900a-4276-acea-b7aecdf456d7/volumes/kubernetes.io~nfs/pvc-72502148-4c02-4f45-a9aa-4cc19d701503
Output: Running scope as unit: run-r177080a878e5475f952104755b41a3e9.scope
mount.nfs: Protocol not supported
$ kubectl logs -n kubeflow mysql-6bcbfbb6b8-rzlf8
2020-03-04 13:35:11+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.6.47-1debian9 started.
2020-03-04 13:35:11+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-03-04 13:35:11+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.6.47-1debian9 started.
mkdir: cannot create directory '/var/lib/mysql/': File exists
$ kubectl logs -n kubeflow katib-db-manager-54b66f9f9d-d5dch
E0304 13:42:21.619878 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:26.611773 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:31.635879 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:36.627814 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:41.619904 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:46.611779 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:51.635869 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:42:56.627784 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:43:01.619889 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:43:06.611712 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:43:11.635854 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
E0304 13:43:16.627642 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.233.23.201:3306: connect: connection refused
F0304 13:43:16.627719 1 main.go:83] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
goroutine 1 [running]:
github.com/kubeflow/katib/vendor/k8s.io/klog.stacks(0xc00024a200, 0xc0002520e0, 0x89, 0xd1)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:830 +0xb8
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).output(0xdf1ca0, 0xc000000003, 0xc000278000, 0xd93a76, 0x7, 0x53, 0x0)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:781 +0x2d0
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).printf(0xdf1ca0, 0x3, 0x9b448c, 0x20, 0xc0001e1f20, 0x1, 0x1)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:678 +0x14b
github.com/kubeflow/katib/vendor/k8s.io/klog.Fatalf(...)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:1209
main.main()
/go/src/github.com/kubeflow/katib/cmd/db-manager/v1alpha3/main.go:83 +0x165
$ kubectl logs -n kubeflow katib-mysql-dcf7dcbd5-djx45
2020-03-04 13:46:11+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.19-1debian9 started.
2020-03-04 13:46:11+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-03-04 13:46:11+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.19-1debian9 started.
mkdir: cannot create directory '/var/lib/mysql': Permission denied
$ kubectl logs -n kubeflow metadata-db-65fb5b695d-656hw
mkdir: cannot create directory '/var/lib/mysql': Permission denied
$ kubectl logs -n kubeflow metadata-grpc-deployment-75f9888cbf-d9q5m
2020-03-04 13:50:04.660297: F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 2002, error: Can't connect to MySQL server on 'metadata-db' (115)MetadataStore cannot be created with the given connection config.
What did you expect to happen: Kubeflow pods all to be up and application functioning.
Anything else you would like to add:
All of my pv, pvcβs are bound. A
$ kubectl get pv,pvc -A
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-0b7c97b5-a650-4355-91bc-00d5de17c4c3 10Gi RWO Delete Bound kubeflow/metadata-mysql managed-nfs-storage 16h
persistentvolume/pvc-72502148-4c02-4f45-a9aa-4cc19d701503 10Gi RWO Delete Bound istio-system/authservice-pvc managed-nfs-storage 18h
persistentvolume/pvc-bdfbde9e-b056-4f6f-8415-2e8e18bcff7b 20Gi RWO Delete Bound kubeflow/mysql-pv-claim managed-nfs-storage 16h
persistentvolume/pvc-c6791418-ae14-42ca-9193-037ef31688d4 10Gi RWO Delete Bound kubeflow/katib-mysql managed-nfs-storage 16h
persistentvolume/pvc-c988ed2d-d121-41fd-9cb5-f22c1906d64b 20Gi RWO Delete Bound kubeflow/minio-pv-claim managed-nfs-storage 16h
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
istio-system persistentvolumeclaim/authservice-pvc Bound pvc-72502148-4c02-4f45-a9aa-4cc19d701503 10Gi RWO managed-nfs-storage 18h
kubeflow persistentvolumeclaim/katib-mysql Bound pvc-c6791418-ae14-42ca-9193-037ef31688d4 10Gi RWO managed-nfs-storage 16h
kubeflow persistentvolumeclaim/metadata-mysql Bound pvc-0b7c97b5-a650-4355-91bc-00d5de17c4c3 10Gi RWO managed-nfs-storage 16h
kubeflow persistentvolumeclaim/minio-pv-claim Bound pvc-c988ed2d-d121-41fd-9cb5-f22c1906d64b 20Gi RWO managed-nfs-storage 16h
kubeflow persistentvolumeclaim/mysql-pv-claim Bound pvc-bdfbde9e-b056-4f6f-8415-2e8e18bcff7b 20Gi RWO managed-nfs-storage 16h
Environment:
- Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard):
- kfctl version: (use
kfctl version
): kfctl v1.0-0-g94c35cf - Kubernetes platform: kubespray
- Kubernetes version: (use
kubectl version
)π kubectl version Client Version: version.Info{Major:β1β, Minor:β15β, GitVersion:βv1.15.3β, GitCommit:β2d3c76f9091b6bec110a5e63777c332469e0cba2β, GitTreeState:βcleanβ, BuildDate:β2019-08-19T11:05:50Zβ, GoVersion:βgo1.12.9β, Compiler:βgcβ, Platform:βlinux/amd64β} Server Version: version.Info{Major:β1β, Minor:β15β, GitVersion:βv1.15.3β, GitCommit:β2d3c76f9091b6bec110a5e63777c332469e0cba2β, GitTreeState:βcleanβ, BuildDate:β2019-08-19T11:05:50Zβ, GoVersion:βgo1.12.9β, Compiler:βgcβ, Platform:βlinux/amd64β} - OS (e.g. from
/etc/os-release
): Ubuntu 18.04.3
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (1 by maintainers)
@vaskokj Just to add- I did a completely new install with the kubeflow v1.0 release without any issues and without have to change anything. This was using PVCs from nfs-client-provisioner.