ray: [Core] ray.init() hangs/fails after "Started a local Ray instance."
What happened + What you expected to happen
Running the following snippet will hang indefinitely
>>> import ray
>>> ray.init()
2023-01-24 11:44:47,741 INFO worker.py:1538 -- Started a local Ray instance.
Sometimes it will fail instead
[2023-01-24 11:50:22,050 E 31652 31652] core_worker.cc:179: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
```Running the following snippet will hang indefinitely
```python
>>> import ray
>>> ray.init()
2023-01-24 11:44:47,741 INFO worker.py:1538 -- Started a local Ray instance.
Sometimes it will fail instead
[2023-01-24 11:50:22,050 E 31652 31652] core_worker.cc:179: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
Versions / Dependencies
Python 3.9.13
Ray 2.2.0 (installed with pip install --upgrade ray[rllib])
grpcio 1.43.0
OS: CentOS Linux 7
Reproduction script
import ray
ray.init()
Issue Severity
High: It blocks me from completing my task.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (6 by maintainers)
Try installing grpcio version 1.48.1, it worked for me. My environment is as follows:
Hey! So:
I cannot run the
ulimitas I don’t have root/sudo access.Interesting - seems some OS resources not available from the core worker logs.
How many threads a process is allowed to create in your system? (I guess this could be obtained through something like
cat /proc/sys/kernel/threads-max?Could you try running
ulimit -n 65536before starting the python program?