ray: [Core] ray.init() hangs/fails after "Started a local Ray instance."

What happened + What you expected to happen

Running the following snippet will hang indefinitely

>>> import  ray
>>> ray.init()
2023-01-24 11:44:47,741 INFO worker.py:1538 -- Started a local Ray instance.

Sometimes it will fail instead

[2023-01-24 11:50:22,050 E 31652 31652] core_worker.cc:179: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
```Running the following snippet will hang indefinitely
```python
>>> import  ray
>>> ray.init()
2023-01-24 11:44:47,741 INFO worker.py:1538 -- Started a local Ray instance.

Sometimes it will fail instead

[2023-01-24 11:50:22,050 E 31652 31652] core_worker.cc:179: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Versions / Dependencies

Python 3.9.13 Ray 2.2.0 (installed with pip install --upgrade ray[rllib]) grpcio 1.43.0 OS: CentOS Linux 7

Reproduction script

import ray
ray.init()

Issue Severity

High: It blocks me from completing my task.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Try installing grpcio version 1.48.1, it worked for me. My environment is as follows:

CentOS 7
Python 3.7.11
Ray 2.5.1
grpcio 1.48.1

Hey! So:

bash-4.2$ cat /proc/sys/kernel/threads-max
1025624

I cannot run the ulimit as I don’t have root/sudo access.

Interesting - seems some OS resources not available from the core worker logs.

  1. How many threads a process is allowed to create in your system? (I guess this could be obtained through something like cat /proc/sys/kernel/threads-max?

  2. Could you try running ulimit -n 65536 before starting the python program?