ray: [Bug] Ray 1.7 and 1.8 ray.init hangs when some versions of Python gRPC library is used due to global symbol export

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

This is not a contribution.

After upgrading to Ray 1.7, we noticed that when we import pybinded C++ code that uses a different version of grpc than comes with ray, we consistently get segfaults.

Turns out this is due to Ray exporting its grpc symbols so they are global and available to things like _streaming.so. This seems to have been added in https://github.com/ray-project/ray/pull/18490

When I comment out *grpc*; at https://github.com/ray-project/ray/blob/2367a2cb9033913b68b1230316496ae273c25b54/src/ray/ray_version_script.lds#L43 (so the symbols are not exported) and build ray 1.7.0 locally, our code does not segfault.

Looking at https://github.com/ray-project/ray/pull/18870, it seems like *absl*; caused a similar issue and has already been removed, but @ashione mentions in https://github.com/ray-project/ray/pull/18870#issuecomment-926701935 that *grpc*; might be needed for some ray/streaming tests to pass.

Would it be possible for someone to look into this and remove grpc; ? I think it would be preferable to not export global symbols for commonly used libs like grpc and absl, to avoid causing issues for anyone using ray alongside C extensions built with other versions of these libs.

This is currently blocking my team from upgrading to ray 1.7+.

Thanks!

Versions / Dependencies

ray==1.7.0

Reproduction script

N/A

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 15 (12 by maintainers)

Most upvoted comments

This is fixed in the master, and the change will be included in 1.9

I see, I was on gcc 7. It’s a bit tricky for me to upgrade my gcc right now, but saw the PR landed 23 hours ago so I just tried via the nightly wheel from https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp36-cp36m-manylinux2014_x86_64.whl assuming the fix made it to the latest build.

Happy to report I didn’t get a segfault 🎉 😄

@rkooo567 sorry I have no idea how I did that, that was accidental

Glad to hear that! I will close the issue. @scv119 @ericl we should consider having a patch release with this commit. I’ve seen other users who went through the same issue. Also we should consider adding code owner to the src/ray/ray_version_script.lds file.

Cc @scv119 I think we should handle this (and probably fine streaming tests not passing)