jupyter-server-proxy: Spark UI not accessible

Hi everybody,

as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030. The Problem is as following:

After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error. Here are (hopefully) all the relevant Information:

  1. I create a new user image from the from the jupyter/pyspark image
  2. The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
  1. I create the SparkContext() in the pod, created with respective image which gives me the link to the UI.
  2. The SparkContext() is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
  1. The link created by Spark is obviously not accessible on the hub as it points to <POD_IP>:4040
  2. I try to access the UI via .../username/proxy/4040 and .../username/proxy/4040/ both don’t work and lead to a Jupyterhub 404.
  3. Other ports are accessible via this method so I assume nbserverextension is working correctly.
  4. This is the output of npnetstat -pl:
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:52211         0.0.0.0:*               LISTEN      23/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39388         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      23/python
tcp6       0      0 jupyter-hagen:43878     [::]:*                  LISTEN      45/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      45/java
tcp6       0      0 localhost:32816         [::]:*                  LISTEN      45/java
tcp6       0      0 jupyter-hagen:41793     [::]:*                  LISTEN      45/java

One can see that the java processes have another format due to tcp6

  1. To check if this is the error I set the environment variable '_JAVA_OPTIONS' set to "-Djava.net.preferIPv4Stack=true" .

  2. This results in the following output but does not resolve the problem:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:4040            0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:34990         0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:36079         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:35119     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:42195     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34836         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      456/python
  1. I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!

My user image is available on docker hub at idalab/spark-user:1.0.2 so this should be easy to inject for debugging if neccessary.

Thank you for your help.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 40 (1 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the documentation. I did as said above and it all works fine except the Executors tab in Spark UI. It seems that the proxy replaces the [app-id] with the port instead of the actual app-id.

From: https://spark.apache.org/docs/latest/monitoring.html /applications/[app-id]/allexecutors | A list of all(active and dead) executors for the given application.

Capture

The problem that allexecutors endpoint returns 404 can be fixed by modifying core/src/main/resources/org/apache/spark/ui/static/utils.js. For example, Our hub url include jupyter in URL.

function getStandAloneAppId(cb) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseURI.indexOf("jupyter") > 0) { ind = 0 }   //  newly added line
 
 
function createRESTEndPointForExecutorsPage(appId) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseuri.indexof("jupyter") > 0) { ind = 0 }   //  newly added line
 

function createTemplateURI(appId, templateName) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseuri.indexof("jupyter") > 0) { ind = 0 }   //  newly added line

But basically this problem can be fixed simply if jupyter-proxy-server extension supports modification of proxy/ URL infix. Since spark javascript functions in UI tries to handle proxy string in URL as you can see the code above.

Is it possible to modify proxy string infix in URL for jupyter-server-proxy extension? (e.g, by setting some options…) I searched the code of this repository, but could find any hardcoded proxy string. The proxy string might come from jupyter-server extensions or somewhere outside of this repository 😦

@ransoor2 Were you able to find a workaround for the blank spark UI ‘Executor’ tab? I have the same issue.

also looking for an update.

This is partially addressed by 50e0358. Visiting /hub/user/proxy/4040/ still takes you to /jobs/ but I think that is the webui. However visiting /hub/user/proxy/4040/{jobs,environment,…}/ does the right thing without requiring the proxyBase setting.

Jupyterhub proxy allows us to create named servers, I was able to access executors tab with following traitlets config

c.ServerProxy.servers = {
    "spark_ui": {
        "port" : 4040,
        "absolute_url": False
    }
}

Then, you will be able to access spark UI under $HUB_URL/spark_ui/jobs/ without the problematic proxy keyword.

So, the issue is in Spark Core.

See the utility: https://github.com/apache/spark/blob/c2d0d700b551e864bb7b2ae2a175ec8ade704488/core/src/main/resources/org/apache/spark/ui/static/utils.js#L88 .

function getStandAloneAppId(cb) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (ind > 0) {
    var appId = words[ind + 1];
    cb(appId);
    return;
  }
...

getStandAloneAppId will always return the value after “proxy”, which in our case is the port, 4040

@yuvipanda Thanks for help! Still doesn’t work.

  1. I think you misspelled in setup.py. Should be jupyter_sparkui_proxy/etc/jupyter-sparkui-proxy-serverextension.json instead of jupyter_server_proxy/etc/jupyter-server-proxy-serverextension.json.

  2. Im running in my dockerfile: ADD common/jupyter-sparkui-proxy /jupyter-sparkui-proxy RUN cd /jupyter-sparkui-proxy &&
    python setup.py install installation looks correct, but im still getting the same error as above.

Looks like the Url can be changed with SPARK_PUBLIC_DNS. I tried it and changed it to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/. This changes the sc.uiWebUrl to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/:4040 resulting in a link that is actally redirecting to the web app but the app is still broken and links point to <JUPYTERHUB_URL>/<XYZ>