kubeflow: Kubeflow on EKS not starting jupyter notebooks
Hello, I’m playing around with a Kubeflow installation on EKS following the instructions in https://www.kubeflow.org/docs/aws/deploy/install-kubeflow/
Everything seems to work fine, but when I try to start a jupyter notebook server and i click “SPAWN” the request returns a 500.
Checking the logs in the jupyter-web-app pod I see the following:
[2019-05-22 15:31:20,934] ERROR in app: Exception on /api/namespaces/kubeflow/notebooks [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/kubeflow_jupyter/default/app.py", line 40, in post_notebook
poddefaultLabels = api.get_poddefaults_labels(namespace)
File "/app/kubeflow_jupyter/common/api.py", line 123, in get_poddefaults_labels
ns, "poddefaults")['items']
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 1432, in list_namespaced_custom_object
(data) = self.list_namespaced_custom_object_with_http_info(group, version, namespace, plural, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 1538, in list_namespaced_custom_object_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'db5efa14-8ccd-4921-a2ac-8c885afe154b', 'Content-Type': 'text/plain; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 22 May 2019 15:31:20 GMT', 'Content-Length': '19'})
HTTP response body: 404 page not found
So to me it seems it’s trying to hit an endpoint in the Kubernetes api that doesn’t exist (404). I’m using the current master kubeflow with Kubernetes v1.12 on EKS. I’m running a very simple setup in dedicated VPC, the only caveat is that i’m using dedicated IAM roles as I can’t let eksctl create them on-demand unfortunately.
Can anyone help me debug the issue?
Thanks a lot for this project and thank you in advance for your help! Alessandro
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (11 by maintainers)
NOT
export KUBEFLOW_VERSION=v0.5-branch
export KUBEFLOW_VERSION=0.5-branch
is right and docurl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.5-branch/scripts/download.sh | bash
.because in https://raw.githubusercontent.com/kubeflow/kubeflow/v0.5-branch/scripts/download.sh,
v
character is added.KUBEFLOW_TAG=v${KUBEFLOW_VERSION}
Issue-Label Bot is automatically applying the label
kind/bug
to this issue, with a confidence of 0.84. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.
@amarrella Can you change to elder version temporarily?
Lastest change for PodDefault and AdmissionController is not compatible with current AWS settings
@amarrella I mean use v0.5-branch kfctl.sh.
And then edit
${KUBEFLOW_SRC}/scripts/aws/util.sh
to add following changes before yourapply k8s
. I will make this PR cherry-picked back to 0.5.https://github.com/kubeflow/kubeflow/pull/3340/files