kubeflow: Kubeflow on EKS not starting jupyter notebooks

Hello, I’m playing around with a Kubeflow installation on EKS following the instructions in https://www.kubeflow.org/docs/aws/deploy/install-kubeflow/

Everything seems to work fine, but when I try to start a jupyter notebook server and i click “SPAWN” the request returns a 500.

Checking the logs in the jupyter-web-app pod I see the following:

[2019-05-22 15:31:20,934] ERROR in app: Exception on /api/namespaces/kubeflow/notebooks [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/app/kubeflow_jupyter/default/app.py", line 40, in post_notebook
    poddefaultLabels = api.get_poddefaults_labels(namespace)
  File "/app/kubeflow_jupyter/common/api.py", line 123, in get_poddefaults_labels
    ns, "poddefaults")['items']
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 1432, in list_namespaced_custom_object
    (data) = self.list_namespaced_custom_object_with_http_info(group, version, namespace, plural, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 1538, in list_namespaced_custom_object_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'db5efa14-8ccd-4921-a2ac-8c885afe154b', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 22 May 2019 15:31:20 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found

So to me it seems it’s trying to hit an endpoint in the Kubernetes api that doesn’t exist (404). I’m using the current master kubeflow with Kubernetes v1.12 on EKS. I’m running a very simple setup in dedicated VPC, the only caveat is that i’m using dedicated IAM roles as I can’t let eksctl create them on-demand unfortunately.

Can anyone help me debug the issue?

Thanks a lot for this project and thank you in advance for your help! Alessandro

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

export KUBEFLOW_VERSION=v0.5-branch

NOT export KUBEFLOW_VERSION=v0.5-branch export KUBEFLOW_VERSION=0.5-branch is right and do curl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.5-branch/scripts/download.sh | bash.

because in https://raw.githubusercontent.com/kubeflow/kubeflow/v0.5-branch/scripts/download.sh, v character is added. KUBEFLOW_TAG=v${KUBEFLOW_VERSION}

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.84. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@amarrella Can you change to elder version temporarily?

kubectl -n kubeflow edit deployment jupyter-web-app

change image to `gcr.io/kubeflow-dev/jupyter-web-app:v0-43-g810b0b46`

Lastest change for PodDefault and AdmissionController is not compatible with current AWS settings

@amarrella I mean use v0.5-branch kfctl.sh.

export KUBEFLOW_SRC=/tmp/kubeflow-aws
unset KUBEFLOW_TAG # in case you set this env.
export KUBEFLOW_VERSION=v0.5-branch
mkdir -p ${KUBEFLOW_SRC} && cd ${KUBEFLOW_SRC}
curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_VERSION}/scripts/download.sh | bash

And then edit ${KUBEFLOW_SRC}/scripts/aws/util.sh to add following changes before your apply k8s. I will make this PR cherry-picked back to 0.5.

https://github.com/kubeflow/kubeflow/pull/3340/files