ray: [Core] Error on Windows with virtual environment: gRPC unavailable
What happened + What you expected to happen
When running Ray locally on Windows 10:
ray.init()
# Some distributed task
tune.run(...)
it runs for 30 seconds and then terminates with the following error:
2022-04-15 18:40:10,052 ERROR ray_trial_executor.py:102 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
File "<python>\lib\site-packages\ray\tune\ray_trial_executor.py", line 93, in post_stop_cleanup
ray.get(future, timeout=0)
File "<python>\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "<python>\lib\site-packages\ray\worker.py", line 1811, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: ImplicitFunc
actor_id: c422fd8fc896ec3e3073fb2201000000
pid: 17256
namespace: 88cbc294-b174-407a-98b8-4d963422ff38
ip: 127.0.0.1
The actor is dead because its owner has died. Owner Id: 01000000ffffffffffffffffffffffffffffffffffffffffffffffff Owner Ip address: 127.0.0.1 Owner worker exit type: NODE_DIED
Traceback (most recent call last):
File "<redacted program>", line 125, in <module>
asyncio.run(main())
File "c:\python39\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "c:\python39\lib\asyncio\base_events.py", line 642, in run_until_complete
return future.result()
File "<redacted program>", line 75, in main
analysis = tune.run(
File "<python>\lib\site-packages\ray\tune\tune.py", line 695, in run
raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [_tune_train_9582e_00000])
Viewing the Ray logs the gRPC unavailable error comes up multiple times in different log files. The most detailed one can be found in dashboard.log:
2022-04-15 18:40:04,364 ERROR node_head.py:259 -- Error updating node stats of 5241012cc8ceeec135c64c837765766475f2416a95c21496eabd871c.
Traceback (most recent call last):
File "<python>\lib\site-packages\ray\dashboard\modules\node\node_head.py", line 250, in _update_node_stats
reply = await stub.GetNodeStats(
File "<python>\lib\site-packages\grpc\aio\_call.py", line 290, in __await__
raise _create_rpc_error(self._cython_call._initial_metadata,
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1650040804.364000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3135,"referenced_errors":[{"created":"@1650040804.364000000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
>
Even with the Windows firewall disabled it doesn’t work. Though it does work when running it in WSL Ubuntu-18.04.
Versions / Dependencies
Relevant Python 3.9.7 libraries explicitly installed via Pipenv 2020.11.15:
ray[default, tune]==1.12.0
Windows 10 Pro version: 19044.1645
Reproduction script
In my case, I used a Pipenv virtual environment but I got the same result when testing with Venv:
pip install --user pipenv
# Need to manually install redis for ray to start
pipenv install ray[default] redis
pipenv run python ./main.py
Contents of main.py:
import time
import ray
ray.init()
time.sleep(30)
Issue Severity
Medium: It is a significant difficulty but I can work around it.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 26 (14 by maintainers)
@mattip I’ve updated the original post with a shorter reproduction script and the installation steps to recreate the environment.
Edit: Also uninstalling python 3.10 and removing the two PATH entries didn’t help.