ray: [Bug] Ray 1.7 and 1.8 ray.init hangs when some versions of Python gRPC library is used due to global symbol export
Search before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
This is not a contribution.
After upgrading to Ray 1.7, we noticed that when we import pybinded C++ code that uses a different version of grpc than comes with ray, we consistently get segfaults.
Turns out this is due to Ray exporting its grpc symbols so they are global and available to things like _streaming.so. This seems to have been added in https://github.com/ray-project/ray/pull/18490
When I comment out *grpc*; at https://github.com/ray-project/ray/blob/2367a2cb9033913b68b1230316496ae273c25b54/src/ray/ray_version_script.lds#L43 (so the symbols are not exported) and build ray 1.7.0 locally, our code does not segfault.
Looking at https://github.com/ray-project/ray/pull/18870, it seems like *absl*; caused a similar issue and has already been removed, but @ashione mentions in https://github.com/ray-project/ray/pull/18870#issuecomment-926701935 that *grpc*; might be needed for some ray/streaming tests to pass.
Would it be possible for someone to look into this and remove grpc; ? I think it would be preferable to not export global symbols for commonly used libs like grpc and absl, to avoid causing issues for anyone using ray alongside C extensions built with other versions of these libs.
This is currently blocking my team from upgrading to ray 1.7+.
Thanks!
Versions / Dependencies
ray==1.7.0
Reproduction script
N/A
Anything else
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 15 (12 by maintainers)
This is fixed in the master, and the change will be included in 1.9
I see, I was on gcc 7. It’s a bit tricky for me to upgrade my gcc right now, but saw the PR landed 23 hours ago so I just tried via the nightly wheel from https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp36-cp36m-manylinux2014_x86_64.whl assuming the fix made it to the latest build.
Happy to report I didn’t get a segfault 🎉 😄
@rkooo567 sorry I have no idea how I did that, that was accidental
Glad to hear that! I will close the issue. @scv119 @ericl we should consider having a patch release with this commit. I’ve seen other users who went through the same issue. Also we should consider adding code owner to the
src/ray/ray_version_script.ldsfile.Cc @scv119 I think we should handle this (and probably fine streaming tests not passing)