kubeflow: Jupyterlab, Rstudio and VSCode do not run as non root in Kubeflow 1.8

/kind bug

@DavidSpek

[julius@fedora 1.3]$ kubectl -n kubeflow-user logs pod/julius1-0
s6-overlay-preinit: fatal: unable to mkdir /var/run/s6: Permission denied

It also does not work with podman

podman run --user 1:0 public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.3.0-rc.0 

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...s6-chown: fatal: unable to chown /var/run/s6/etc/cont-init.d/01-copy-tmp-home: Operation not permitted
s6-chmod: fatal: unable to change mode of /var/run/s6/etc/cont-init.d/01-copy-tmp-home: Operation not permitted
s6-chown: fatal: unable to chown /var/run/s6/etc/services.d/jupyterlab/run: Operation not permitted
s6-chmod: fatal: unable to change mode of /var/run/s6/etc/services.d/jupyterlab/run: Operation not permitted
exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-copy-tmp-home: executing... 
cp: cannot access '/tmp_home/jovyan/.jupyter': Permission denied
[cont-init.d] 01-copy-tmp-home: exited 1.
[cont-init.d] done.
[services.d] starting services
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jupyterlab: warning: unable to spawn ./run - waiting 10 seconds
[services.d] done.
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jupyterlab: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jupyterlab: warning: unable to spawn ./run - waiting 10 seconds
s6-supervise (child): fatal: unable to exec run: Permission denied

The user should be set in the statefulset itself if the container is not good enough to run as any user

podman run --user 1000:0 public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.3.0-rc.0 
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...s6-chown: fatal: unable to chown /var/run/s6/etc/cont-init.d/01-copy-tmp-home: Operation not permitted
s6-chown: fatal: unable to chown /var/run/s6/etc/services.d/jupyterlab/run: Operation not permitted
exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-copy-tmp-home: executing... 
[cont-init.d] 01-copy-tmp-home: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[I 2021-04-06 11:16:26.827 ServerApp] jupyterlab | extension was successfully linked.
[I 2021-04-06 11:16:26.834 ServerApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2021-04-06 11:16:27.103 ServerApp] nbclassic | extension was successfully linked.
[W 2021-04-06 11:16:27.186 ServerApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 2021-04-06 11:16:27.203 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.8/site-packages/jupyterlab
[I 2021-04-06 11:16:27.203 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2021-04-06 11:16:27.207 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-04-06 11:16:27.224 ServerApp] nbclassic | extension was successfully loaded.
[I 2021-04-06 11:16:27.225 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2021-04-06 11:16:27.225 ServerApp] Jupyter Server 1.4.1 is running at:
[I 2021-04-06 11:16:27.225 ServerApp] http://1ecd45fc4e8d:8888/lab
[I 2021-04-06 11:16:27.225 ServerApp]  or http://127.0.0.1:8888/lab
[I 2021-04-06 11:16:27.225 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 41 (31 by maintainers)

Most upvoted comments

Our team encountered the error discussed here, but in a slightly different context (I think). We were able to resolve the issue, read on for more information.

tl,dr;

Our team has to use a custom securityContext in order to access an NFS share. This prevents us from running as the default jovyan user (our UID/GID are different).

We ran chmod 775 s6/services.d/jupyterlab/run before building our custom Jupyter image. This corresponds to the file https://github.com/kubeflow/kubeflow/blob/master/components/example-notebook-servers/jupyter/s6/services.d/jupyterlab/run here in the kubeflow/kubeflow repo.

We also added chmod -R 775 /tmp_home to the relevant place in the jupyter Dockerfile while building our custom image, but it is not clear whether that had any impact on fixing the issue.

Problem Description / Environment

We created a custom notebook image (called custom-jupyter:1 in this discussion) based on the base, jupyter, and jupyter-tensorflow Dockerfiles. Our custom Dockerfile combines the key parts of these into one, and reorganizes some of the steps slightly, but overall it is essentially the same logic condensed into a single file. We were able to spin up an instance of this custom image and access JupyterLab without problems.

We are using Kubeflow 1.4.1 on Kubernetes 1.21.9. The Kubernetes cluster was provisioned through Rancher v2.6.3-patch1. We are using Docker 20.10.12 as our container engine under the hood.

We need to mount an NFS share, and in order to do that, we need to set a securityContext that changes the UID & GID. We currently do this by injecting the securityContext block and the relevant NFS volume mounts via kubectl patch. For reference, assume we spun up a notebook called test, which creates pod test-0; here is a heavily-redacted example of what kubectl get -n myuser -o yaml pod test-0 gives us, showcasing the injected data in the securityContext, volumeMounts, and volumes areas:

(...)
spec:
  containers:
  - (...)
    image: custom-jupyter:1
    name: test
    (...)
    securityContext:
      runAsGroup: 1234
      runAsUser: 1234
    (...)
    volumeMounts:
    - mountPath: /home/jovyan
      name: workspace-test
    - mountPath: /mnt/nfs_share
      name: my-nfs
    (...)
    workingDir: /home/jovyan
  (...)
  volumes:
  (...)
  - name: my-nfs
    nfs:
      path: /data
      server: my.nfs.server.path
(...)

After patching our custom-jupyter:1 notebook image, we are unable to connect to JupyterLab. As indicated throughout this thread, clicking the “CONNECT” button in the Kubeflow UI takes us to a page that just says:

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Our kubectl logs output is also similar to the output shown above:

$ kubectl logs -n myuser test-0 test
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
s6-chown: fatal: unable to chown /var/run/s6/etc/cont-init.d/01-copy-tmp-home: Operation not permitted
s6-chmod: fatal: unable to change mode of /var/run/s6/etc/cont-init.d/01-copy-tmp-home: Operation not permitted
s6-chown: fatal: unable to chown /var/run/s6/etc/services.d/jupyterlab/run: Operation not permitted
s6-chmod: fatal: unable to change mode of /var/run/s6/etc/services.d/jupyterlab/run: Operation not permitted
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-copy-tmp-home: executing...
cp: cannot access '/tmp_home/jovyan/.jupyter': Permission denied
[cont-init.d] 01-copy-tmp-home: exited 1.
[cont-init.d] done.
[services.d] starting services
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jupyterlab: warning: unable to spawn ./run - waiting 10 seconds
[services.d] done.
s6-supervise (child): fatal: unable to exec run: Permission denied
s6-supervise jupyterlab: warning: unable to spawn ./run - waiting 10 seconds

((This error then repeats every 10 seconds, forever))

Problem Solution

Mounting an emptyDir to /var/run/s6 as suggested above did NOT fix the problem.

I zeroed in on those initial error messages from s6 – specifically, I noticed that the permissions change of the run file for JupyterLab failed. I went back to our source for the Docker image and saw that s6/services.d/jupyterlab/run was not executable. This file was copied as-is from the kubeflow/kubeflow repo’s jupyter image example: https://github.com/kubeflow/kubeflow/blob/2d347e97d37b290d5764e84fc26f4d9870ba06ce/components/example-notebook-servers/jupyter/s6/services.d/jupyterlab/run (this URL is pinned to the latest master commit; the file is the same as the one we’re using).

My assumption is that JupyterLab was failing to start because s6 failed to make the run script executable, and it wasn’t already executable to begin with.

To fix this, I ran (in my local development environment)

cd custom-jupyter-docker/s6/services.d/jupyterlab/
chmod 775 run

That made the file executable.

I also noticed this error in the kubectl logs:

[cont-init.d] 01-copy-tmp-home: executing...
cp: cannot access '/tmp_home/jovyan/.jupyter': Permission denied
[cont-init.d] 01-copy-tmp-home: exited 1.

After some investigation, it looks like after Jupyter gets installed, that particular directory has 600 permissions. I don’t know if this is even relevant to the issue at hand, but I also updated the Dockerfile to set mode 775 on everything in /tmp_home. (I probably should/could do something like 664 instead, but this doesn’t seem to have broken anything, so I probably will leave it as-is.)

RUN mkdir -p /tmp_home \
 && cp -r ${HOME} /tmp_home \
 && chown -R ${NB_USER}:users /tmp_home \
 && chmod -R 775 /tmp_home

This change corresponds to lines 70-72 here: https://github.com/kubeflow/kubeflow/blob/2d347e97d37b290d5764e84fc26f4d9870ba06ce/components/example-notebook-servers/jupyter/Dockerfile

After this, I rebuilt the custom-jupyter Docker image. Deploying an instance of this new image along with the patches described above fixed the issue: I am able to access JupyterLab, and everything seems to work normally, despite the fact that I am running as a non-jovyan user with a different UID/GID.

I don’t 100% understand why this works, but I think that since Istio injects the GID 1337 into my user as an additional GID (so I have both my custom 1234 GID shown in my sample YAML, and the Istio 1337 GID), the fact that everything is group-accessible allows me to have full access to jovyan’s things, despite not being jovyan.

Recommendation

My recommendation for now would be for the Kubeflow dev team to mark the components/example-notebook-servers/jupyter/s6/services.d/jupyterlab/run file as executable and then commit that to the repo and rebuild the sample images. Consider also making the change to /tmp_home, although as stated it is not clear that that does anything useful.

I concur with @srikantt question! Has anybody a hint to fix this issue with a workaround? Or is it possible that the manual pod deployment with this yml (#5808 (comment)) is accessible from Kubeflow-GUI?

Yes, build your own Jupyterlab in a proper way without s6.