ray: [Core] Error on Windows with virtual environment: gRPC unavailable

What happened + What you expected to happen

When running Ray locally on Windows 10:

ray.init()

# Some distributed task
tune.run(...)

it runs for 30 seconds and then terminates with the following error:

2022-04-15 18:40:10,052	ERROR ray_trial_executor.py:102 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "<python>\lib\site-packages\ray\tune\ray_trial_executor.py", line 93, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "<python>\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "<python>\lib\site-packages\ray\worker.py", line 1811, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
	class_name: ImplicitFunc
	actor_id: c422fd8fc896ec3e3073fb2201000000
	pid: 17256
	namespace: 88cbc294-b174-407a-98b8-4d963422ff38
	ip: 127.0.0.1
The actor is dead because its owner has died. Owner Id: 01000000ffffffffffffffffffffffffffffffffffffffffffffffff Owner Ip address: 127.0.0.1 Owner worker exit type: NODE_DIED

Traceback (most recent call last):
  File "<redacted program>", line 125, in <module>
    asyncio.run(main())
  File "c:\python39\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "c:\python39\lib\asyncio\base_events.py", line 642, in run_until_complete
    return future.result()
  File "<redacted program>", line 75, in main
    analysis = tune.run(
  File "<python>\lib\site-packages\ray\tune\tune.py", line 695, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [_tune_train_9582e_00000])

Viewing the Ray logs the gRPC unavailable error comes up multiple times in different log files. The most detailed one can be found in dashboard.log:

2022-04-15 18:40:04,364	ERROR node_head.py:259 -- Error updating node stats of 5241012cc8ceeec135c64c837765766475f2416a95c21496eabd871c.
Traceback (most recent call last):
  File "<python>\lib\site-packages\ray\dashboard\modules\node\node_head.py", line 250, in _update_node_stats
    reply = await stub.GetNodeStats(
  File "<python>\lib\site-packages\grpc\aio\_call.py", line 290, in __await__
    raise _create_rpc_error(self._cython_call._initial_metadata,
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1650040804.364000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3135,"referenced_errors":[{"created":"@1650040804.364000000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
>

Even with the Windows firewall disabled it doesn’t work. Though it does work when running it in WSL Ubuntu-18.04.

Versions / Dependencies

Relevant Python 3.9.7 libraries explicitly installed via Pipenv 2020.11.15:

ray[default, tune]==1.12.0

Windows 10 Pro version: 19044.1645

Reproduction script

In my case, I used a Pipenv virtual environment but I got the same result when testing with Venv:

pip install --user pipenv
# Need to manually install redis for ray to start
pipenv install ray[default] redis
pipenv run python ./main.py

Contents of main.py:

import time
import ray

ray.init()
time.sleep(30)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 26 (14 by maintainers)

Most upvoted comments

@mattip I’ve updated the original post with a shorter reproduction script and the installation steps to recreate the environment.

Edit: Also uninstalling python 3.10 and removing the two PATH entries didn’t help.