kubeflow: the kubeflow installation is stuck on the microk8s.enable kubeflow

/kind bug kubeflow can not install on my VM, which is stuck on the microk8s.enable kubeflow, apperently the problem is juju. What steps did you take and what happened: [A clear and concise description of what the bug is.] I followed the steps under the official tutorial, which is https://ubuntu.com/kubeflow/install

What did you expect to happen: so the ErrImagePull happened in many pods, the describe infomation shows that the image can not pull.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

root@Video04:~# microk8s.kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE controller-microk8s-localhost controller-0 2/2 Running 4 83m controller-microk8s-localhost modeloperator-56d4bb6587-stbg5 1/1 Running 0 64m ingress nginx-ingress-microk8s-controller-kw2kh 1/1 Running 0 3h42m kube-system coredns-588fd544bf-pwncg 1/1 Running 0 103m kube-system dashboard-metrics-scraper-59f5574d4-7mvl8 1/1 Running 0 88m kube-system hostpath-provisioner-75fdc8fccd-c2rph 1/1 Running 0 3h42m kube-system kubernetes-dashboard-6d97855997-lw2zg 1/1 Running 0 88m kube-system metrics-server-c65c9d66-fgztk 1/1 Running 0 88m kubeflow ambassador-6dcd6bf4c9-f9dgb 0/1 ErrImagePull 0 56m kubeflow ambassador-operator-0 1/1 Running 0 56m kubeflow argo-controller-operator-0 1/1 Running 0 51m kubeflow argo-ui-6bd579bc67-s5wrp 0/1 ImagePullBackOff 0 45m kubeflow argo-ui-operator-0 1/1 Running 0 45m kubeflow dex-auth-operator-0 1/1 Running 0 39m kubeflow jupyter-controller-f6cbc6889-bzpn5 0/1 ImagePullBackOff 0 33m kubeflow jupyter-controller-operator-0 1/1 Running 0 33m kubeflow jupyter-web-operator-0 1/1 Running 0 26m kubeflow katib-controller-7547bdb7cd-x4252 0/1 PodInitializing 0 17m kubeflow katib-controller-operator-0 1/1 Running 0 19m kubeflow katib-db-manager-operator-0 1/1 Running 0 8m29s kubeflow katib-db-operator-0 1/1 Running 0 14m kubeflow modeloperator-77587bd695-5krjt 1/1 Running 0 62m and the describe info (I do not show every error pod’s info, but the errors are the same, all of are owned to the registry.jujucharms.com) `root@Video04:~# microk8s.kubectl describe -n kubeflow pod/ambassador-6dcd6bf4c9-f9dgb Name: ambassador-6dcd6bf4c9-f9dgb Namespace: kubeflow Priority: 0 Node: video04/192.168.1.25 Start Time: Mon, 29 Jun 2020 14:44:36 +0800 Labels: juju-app=ambassador pod-template-hash=6dcd6bf4c9 Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default juju.io/controller: 376c9ef1-cde7-4caa-8cb3-aeb53691585a juju.io/model: f9409cec-a0c8-4737-8d69-9610347cb59c juju.io/unit: ambassador/0 seccomp.security.beta.kubernetes.io/pod: docker/default Status: Pending IP: 10.1.97.11 IPs: IP: 10.1.97.11 Controlled By: ReplicaSet/ambassador-6dcd6bf4c9 Init Containers: juju-pod-init: Container ID: containerd://0e36c9782c18286a7f142c35c3093eef4df970537e1a7d2b1424438b1f4de163 Image: jujusolutions/jujud-operator:2.9-beta1.3843 Image ID: docker.io/jujusolutions/jujud-operator@sha256:e6c084c7754c4d906b9dff293a2b0d42df4adf65cf8c2659435c1ac4928dda74 Port: <none> Host Port: <none> Command: /bin/sh Args: -c export JUJU_DATA_DIR=/var/lib/juju export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

  mkdir -p $JUJU_TOOLS_DIR
  cp /opt/jujud $JUJU_TOOLS_DIR/jujud
  initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
  if test -n "$initCmd"; then
  $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
  else
  exit 0
  fi
  
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Mon, 29 Jun 2020 14:44:38 +0800
  Finished:     Mon, 29 Jun 2020 14:44:42 +0800
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /var/lib/juju from juju-data-dir (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-f6z9l (ro)

Containers: ambassador: Container ID:
Image: registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image@sha256:ac028903c79f6913e132522b0e088bb4a1ef312a37d30eb37473cad513434b32 Image ID:
Port: 80/TCP Host Port: 0/TCP State: Waiting Reason: ErrImagePull Ready: False Restart Count: 0 Liveness: http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=30s #success=1 #failure=3 Readiness: http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=30s #success=1 #failure=3 Environment: AMBASSADOR_NAMESPACE: kubeflow Mounts: /usr/bin/juju-run from juju-data-dir (rw,path=“tools/jujud”) /var/lib/juju from juju-data-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-f6z9l (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: juju-data-dir: Type: EmptyDir (a temporary directory that shares a pod’s lifetime) Medium:
SizeLimit: <unset> default-token-f6z9l: Type: Secret (a volume populated by a Secret) SecretName: default-token-f6z9l Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal Scheduled 46m default-scheduler Successfully assigned kubeflow/ambassador-6dcd6bf4c9-f9dgb to video04 Normal Pulled 46m kubelet, video04 Container image “jujusolutions/jujud-operator:2.9-beta1.3843” already present on machine Normal Created 46m kubelet, video04 Created container juju-pod-init Normal Started 45m kubelet, video04 Started container juju-pod-init Normal BackOff 37m kubelet, video04 Back-off pulling image “registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image@sha256:ac028903c79f6913e132522b0e088bb4a1ef312a37d30eb37473cad513434b32” Warning Failed 37m kubelet, video04 Error: ImagePullBackOff Warning Failed 21m (x2 over 37m) kubelet, video04 Failed to pull image “registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image@sha256:ac028903c79f6913e132522b0e088bb4a1ef312a37d30eb37473cad513434b32”: rpc error: code = Unknown desc = failed to pull and unpack image “registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image@sha256:ac028903c79f6913e132522b0e088bb4a1ef312a37d30eb37473cad513434b32”: failed to copy: unexpected EOF Warning Failed 21m (x2 over 37m) kubelet, video04 Error: ErrImagePull Normal Pulling 21m (x3 over 45m) kubelet, video04 Pulling image “registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image@sha256:ac028903c79f6913e132522b0e088bb4a1ef312a37d30eb37473cad513434b32”`

what is more, I tried to pull the image by docker or microk8s.ctr, which I get are: **Error response from daemon: pull access denied for registry.jujucharms.com/kubeflow-charmers/ambassador/oci-image, repository does not exist or may require 'docker login': denied: requested access to the resource is denied**

so I wonder whther the source is fine, or I need more account to access, thank you all.

Environment:

  • Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): microk8s 1.0.2
  • kfctl version: (use kfctl version): microk8s 1.18.4
  • Kubernetes platform: (e.g. minikube) microk8s
  • Kubernetes version: (use kubectl version): microk8s 1.18.4
  • OS (e.g. from /etc/os-release): ubuntu 18.04 LTS

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Did anyone ever find the fix? I am getting this issue.

@rushins: that password is auto-generated as a default option. You can press enter to accept it, or type in your own instead. I’ll update that text to make that more clear

Any solution or workaround to this issue? The same as Failed to pull from registry.jujucharms.com/kubeflow-charmers

@jiaozhentian: I just tried microk8s.enable kubeflow, and it worked fine for me. The error you’re getting looks like a transient network error, are you able to try again and see if that fixes it?

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

idiot…… the link has down