ray: [core][Bug] Ray processes escaping hermetic python environment

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

Issue Severity

Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.

What happened + What you expected to happen

ray.init() the (grand)child python processes are escaping our hermetic python environment (specifically they start to look for modules on the system, instead of our bazel build sandbox)

This leads to Actor failures

(TemporaryActor pid=550)     import google.auth
(TemporaryActor pid=550) ModuleNotFoundError: No module named 'google.auth'
(TemporaryActor pid=544) 2022-03-09 23:15:03,378	ERROR worker.py:431 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::Trainer.__init__() (pid=544, ip=xx.xxx.xxx.xx)
(TemporaryActor pid=544) RuntimeError: The actor with name Trainer failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:

Looking at python/ray/_private/services.py most python subprocesses are started as {sys.executable} <some script> without the -Ss flags that would prevent extending the module search path into site-packages

Please let us know if there are any workarounds that can be applied to deal with this or code references that show this should not be happening.

Versions / Dependencies

ray==1.9.1 python==3.7(via bazel)

Reproduction script

Working on a shareable repro…

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (14 by maintainers)

Most upvoted comments

Thanks @architkulkarni

Could you help me confirm my understanding of the gap between just having -s and having both -sS flags for your use case? I read through the -S documentation which links to the site docs. Is the problem that it imports packages from sys.prefix, which points to /usr/local which contains undesired packages? The docs mention that if you use a virtual environment, sys.prefix just points to that environment–is that a viable workaround?

let me revisit my notes and retry the setup on

Would it be enough to have it just for the Ray workers (the start_worker_cmd)? One of the implementations I have in mind only works for Ray workers. If not, I can think of a different approach.

I think the start worker command and the java_command should be sufficient, when i was initially trying to fix the issue i added the -Ss flag whereever sys.executable was invoked. However from later testing found only passing it in for the worker_cmds (start_worker and java_command) were sufficient.

BTW @architkulkarni we should consider doing this via runtime envs rather than through the top-level API. But yeah, lets go through the api change process.

@ponner-github I see, thanks for running those tests. I’ll make a patch for this and try to get the API change approved.