jupyter-server-proxy: Spark UI not accessible
Hi everybody,
as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030. The Problem is as following:
After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error. Here are (hopefully) all the relevant Information:
- I create a new user image from the from the jupyter/pyspark image
- The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
- I create the
SparkContext()in the pod, created with respective image which gives me the link to the UI. - The
SparkContext()is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
- The link created by Spark is obviously not accessible on the hub as it points to
<POD_IP>:4040 - I try to access the UI via
.../username/proxy/4040and.../username/proxy/4040/both don’t work and lead to a Jupyterhub 404. - Other ports are accessible via this method so I assume nbserverextension is working correctly.
- This is the output of
npnetstat -pl:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:54695 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:33896 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:34577 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:52211 0.0.0.0:* LISTEN 23/python
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 7/python
tcp 0 0 localhost:39388 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:39971 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:32867 0.0.0.0:* LISTEN 23/python
tcp6 0 0 jupyter-hagen:43878 [::]:* LISTEN 45/java
tcp6 0 0 [::]:4040 [::]:* LISTEN 45/java
tcp6 0 0 localhost:32816 [::]:* LISTEN 45/java
tcp6 0 0 jupyter-hagen:41793 [::]:* LISTEN 45/java
One can see that the java processes have another format due to tcp6
-
To check if this is the error I set the environment variable
'_JAVA_OPTIONS'set to"-Djava.net.preferIPv4Stack=true". -
This results in the following output but does not resolve the problem:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:54695 0.0.0.0:* LISTEN 456/python
tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:33896 0.0.0.0:* LISTEN 456/python
tcp 0 0 localhost:34990 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:36079 0.0.0.0:* LISTEN 456/python
tcp 0 0 jupyter-hagen:35119 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:34577 0.0.0.0:* LISTEN 456/python
tcp 0 0 jupyter-hagen:42195 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:34836 0.0.0.0:* LISTEN 456/python
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 7/python
tcp 0 0 localhost:39971 0.0.0.0:* LISTEN 456/python
tcp 0 0 localhost:32867 0.0.0.0:* LISTEN 456/python
- I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!
My user image is available on docker hub at idalab/spark-user:1.0.2 so this should be easy to inject for debugging if neccessary.
Thank you for your help.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 40 (1 by maintainers)
Commits related to this issue
- [SPARK-33579][UI] Fix executor blank page behind proxy ### What changes were proposed in this pull request? Fix some "hardcoded" API urls in Web UI. More specifically, we avoid the use of `location.... — committed to apache/spark by deleted user 4 years ago
- [SPARK-33579][UI] Fix executor blank page behind proxy ### What changes were proposed in this pull request? Fix some "hardcoded" API urls in Web UI. More specifically, we avoid the use of `location.... — committed to apache/spark by deleted user 4 years ago
- Fix that Web UI always correctly get appId Web UI does not correctly get appId when it has `proxy` or `history` in URL. In my case, it's happened on `https://jupyterhub.hosted.our/my-name/proxy/4... — committed to ornew/spark by ornew 3 years ago
- Fix that Web UI always correctly get appId Web UI does not correctly get appId when it has `proxy` or `history` in URL. In my case, it's happened on `https://jupyterhub.hosted.our/my-name/proxy/404... — committed to ornew/spark by ornew 3 years ago
Thanks for the documentation. I did as said above and it all works fine except the Executors tab in Spark UI. It seems that the proxy replaces the [app-id] with the port instead of the actual app-id.
From: https://spark.apache.org/docs/latest/monitoring.html /applications/[app-id]/allexecutors | A list of all(active and dead) executors for the given application.
The problem that allexecutors endpoint returns 404 can be fixed by modifying
core/src/main/resources/org/apache/spark/ui/static/utils.js. For example, Our hub url includejupyterin URL.But basically this problem can be fixed simply if jupyter-proxy-server extension supports modification of
proxy/URL infix. Since spark javascript functions in UI tries to handleproxystring in URL as you can see the code above.Is it possible to modify
proxystring infix in URL for jupyter-server-proxy extension? (e.g, by setting some options…) I searched the code of this repository, but could find any hardcodedproxystring. Theproxystring might come fromjupyter-serverextensions or somewhere outside of this repository 😦also looking for an update.
This is partially addressed by 50e0358. Visiting /hub/user/proxy/4040/ still takes you to /jobs/ but I think that is the webui. However visiting /hub/user/proxy/4040/{jobs,environment,…}/ does the right thing without requiring the proxyBase setting.
Jupyterhub proxy allows us to create named servers, I was able to access executors tab with following traitlets config
Then, you will be able to access spark UI under $HUB_URL/spark_ui/jobs/ without the problematic proxy keyword.
So, the issue is in Spark Core.
See the utility: https://github.com/apache/spark/blob/c2d0d700b551e864bb7b2ae2a175ec8ade704488/core/src/main/resources/org/apache/spark/ui/static/utils.js#L88 .
getStandAloneAppId will always return the value after “proxy”, which in our case is the port, 4040
@yuvipanda Thanks for help! Still doesn’t work.
I think you misspelled in setup.py. Should be jupyter_sparkui_proxy/etc/jupyter-sparkui-proxy-serverextension.json instead of jupyter_server_proxy/etc/jupyter-server-proxy-serverextension.json.
Im running in my dockerfile: ADD common/jupyter-sparkui-proxy /jupyter-sparkui-proxy RUN cd /jupyter-sparkui-proxy &&
python setup.py install installation looks correct, but im still getting the same error as above.
Looks like the Url can be changed with
SPARK_PUBLIC_DNS. I tried it and changed it to<JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/. This changes thesc.uiWebUrlto<JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/:4040resulting in a link that is actally redirecting to the web app but the app is still broken and links point to<JUPYTERHUB_URL>/<XYZ>