ray: [Bug] [Core] Connection to GCS hanging (grpc with ray==1.7.0 issues)
Search before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
When upgrading from ray==1.6.0 to ray==1.7.0, my team has noticed that ray.init() and ray start --head both hang indefinitely.
Reverting back to 1.6.0 resolves the issue. Binary searching through history shows that the bug must have been introduced on September 13th.
To test this we ran
pip3 install -U https://s3-us-west-2.amazonaws.com/ray-wheels/master/$HASH/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
python3 -c "import ray; ray.init()" # <-- This hangs indefinitely! ray start --head also hangs...
We found that a67b9ee8d74b1c4a9dff1df4c225f4325e0f42a7 works but 3bc5f0501f3d573daf49d35ca836e65d1dcbc9ea does not work Since I think that this ray-wheels is packaged by day (correct me if I am wrong) I think its on a commit on September 13th of which there are a decent number: https://github.com/ray-project/ray/commits/master?after=9ca34c7192cf3efbd32275c8cb3a6c98fc56ce60+348&branch=master)
By investigating the stack trace we found that the python process was hanging indefinitely at:
self.global_state_accessor.connect() in the ray/state.py file
There does not seem to be anything interesting in the logs either… everything seems to be up and running. I was able to communnicate with redis server and all the expected keys seemed to be there with the correct values.
When I run ray start --head (which hangs) once I try to connect from another machine it also hangs.
Reproduction script
I am sure this issue is not happening for many other users. It is the case that my team works in a slightly strange environment.
It would be hard to reproduce, hopefully the documented commit hash isolation above helps.
Anything else
This is a consistent bug that is preventing us from upgrading to not only 1.7.0 but also to use python 3.9.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (28 by maintainers)
Downgrade the priority as it is env-specific issue. I will keep communicating with him to resolve the issue