ray: Redis connection errors when calling from ray.init [tune]
What is the problem?
Trying to set up a basic environment to use TensorTrade (TensorFlow) and ray[tune], but I get the following error when trying to connect to redis calling ray.init:
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
Redis is set up, I’ve configured the password, and I can connect ok using redis-cli.
I’ve put a monitor on the redis server, and I can see that ray connects initially, but then the stops:
1620711360.767056 [0 127.0.0.1:57493] "AUTH" "secret"
1620711360.769051 [0 127.0.0.1:57494] "AUTH" "secret"
1620711360.770909 [0 127.0.0.1:57494] "SET" "redis_start_time" "1620711360.7703528"
I tracked the code through and found this in services.py:
def address_to_ip(address):
...
# Make sure localhost isn't resolved to the loopback ip
if ip_address == "127.0.0.1":
ip_address = get_node_ip_address()
return ":".join([ip_address] + address_parts[1:])
It seems that even though I pass in the IP of 127.0.0.1, this code converts it back to 192.168.20.13. It seems that it will connect on 127… address ok, but not on 192… address. Unfortunately, the system I am running is controlled by a group policy and I cannot turn off the Windows firewall completely. I can telnet to redis on 127…, but I can’t telnet on 192… When I installed redis it added firewall rules, but I think the group policy might still prevent it from opening on 192…
So I commented out these two lines of code from address_to_ip:
#if ip_address == "127.0.0.1":
# ip_address = get_node_ip_address()
Then when I run, I get this error:
2021-05-12 20:14:12,212 INFO worker.py:663 -- Connecting to existing Ray cluster at address: 127.0.0.1:6379
...
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 48, in _get_with_retry
raise RuntimeError(f"Could not read '{key}' from GCS (redis). "
RuntimeError: Could not read 'session_name' from GCS (redis). Has redis started correctly on the head node?
I’m assuming that it is something to do with the group policies in my system preventing me from enabling access on 192…, so I’m happy to do my testing with the two lines of code commented out to force the connection to use 127… But it would be nice if I could just do that through configuration.
However, now with the “Could not read ‘session_name’” error, I’m stuck. I don’t know if it is related to the 127… change, or something else.
I also tried taking out the 127… address from ray.init(), but then I got this error:
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
ConnectionError: Error 10061 connecting to 127.0.0.1:17091. No connection could be made because the target machine actively refused it.
...
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\services.py", line 670, in wait_for_redis_to_start
raise RuntimeError(
RuntimeError: Unable to connect to Redis at 127.0.0.1:17091 after 12 retries. Check that 127.0.0.1:17091 is reachable from this machine. If it is not, your firewall may be blocking this port. If the problem is a flaky connection, try setting the environment variable `RAY_START_REDIS_WAIT_RETRIES` to increase the number of attempts to ping the Redis server.
Where did port 17091 come from?
I’ve been discussing this on the redis Discord channel, and they’ve helped me reach this far of the investigation. But now they suggested I log an issue here.
Ray version and other system information (Python version, TensorFlow version, OS): Python 3.8 Windows 10 x64 Everything else was fresh pip installs this week.
Reproduction (REQUIRED)
Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):
import ray
ray.init(address="127.0.0.1:6379", _redis_password="foo!bared")
- [X ] I have verified my script runs in a clean environment and reproduces the issue. (this is my only PC, I don’t have another for testing)
- [X ] I have verified the issue also occurs with the latest wheels.
Full stack trace:
File "C:\Users\me\Desktop\Junk\Python\PY\untitled0.py", line 2, in <module>
ray.init(_redis_password="foo!bared")
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray_private\client_mode_hook.py", line 62, in wrapper
return func(*args, kwargs)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\worker.py", line 730, in init
_global_node = ray.node.Node(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 230, in init
self.start_head_processes()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 860, in start_head_processes
self.start_redis()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 675, in start_redis
process_infos) = ray._private.services.start_redis(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray_private\services.py", line 881, in start_redis
primary_redis_client.set("NumRedisShards", str(num_redis_shards))
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\client.py", line 1801, in set
return self.execute_command('SET', *pieces)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, options)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
ConnectionError: Error 10061 connecting to 192.168.20.13:6379. No connection could be made because the target machine actively refused it.
File "c:\users\me\desktop\junk\python\py\untitled0.py", line 7, in <module>
ray.init(address="127.0.0.1:6379", _redis_password="foo!bared")
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\worker.py", line 767, in init
_global_node = ray.node.Node(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 163, in __init__
session_name = _get_with_retry(redis_client, "session_name")
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 48, in _get_with_retry
raise RuntimeError(f"Could not read '{key}' from GCS (redis). "
RuntimeError: Could not read 'session_name' from GCS (redis). Has redis started correctly on the head node?
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 559, in connect
sock = self._connect()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 615, in _connect
raise err
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\services.py", line 656, in wait_for_redis_to_start
redis_client.client_list()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\client.py", line 1194, in client_list
return self.execute_command('CLIENT LIST')
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
ConnectionError: Error 10061 connecting to 127.0.0.1:17091. No connection could be made because the target machine actively refused it.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\me\desktop\junk\python\py\untitled0.py", line 7, in <module>
ray.init(_redis_password="foo!bared")
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\worker.py", line 730, in init
_global_node = ray.node.Node(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 230, in __init__
self.start_head_processes()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 860, in start_head_processes
self.start_redis()
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\node.py", line 675, in start_redis
process_infos) = ray._private.services.start_redis(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\services.py", line 917, in start_redis
redis_shard_port, p = _start_redis_instance(
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\services.py", line 1028, in _start_redis_instance
wait_for_redis_to_start("127.0.0.1", port, password=password)
File "C:\Users\me\Anaconda3\envs\keras\lib\site-packages\ray\_private\services.py", line 670, in wait_for_redis_to_start
raise RuntimeError(
RuntimeError: Unable to connect to Redis at 127.0.0.1:17091 after 12 retries. Check that 127.0.0.1:17091 is reachable from this machine. If it is not, your firewall may be blocking this port. If the problem is a flaky connection, try setting the environment variable `RAY_START_REDIS_WAIT_RETRIES` to increase the number of attempts to ping the Redis server.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 17 (13 by maintainers)
Since redis is no longer the default message broker, can we close this?
I can help answer this. Ray and ray-based modin works well now. Here is the env info. ray_success.yaml.txt
I’m assigning this to you @mwtian since you are touching this codepath as part of the GCS work. Let us know if you need help working on this / if it turns out to be windows specific 😃