ray: [release/core] scalability envelope distributed test throws `max number of clients reached`
Running with ray submit --start config.yaml test_distributed.py throws hundreds of these:
(raylet, ip=172.31.29.217) Traceback (most recent call last):
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py", line 186, in <module>
(raylet, ip=172.31.29.217) connect_only=True)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/node.py", line 164, in __init__
(raylet, ip=172.31.29.217) session_name = _get_with_retry(redis_client, "session_name")
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/node.py", line 41, in _get_with_retry
(raylet, ip=172.31.29.217) result = redis_client.get(key)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/client.py", line 1606, in get
(raylet, ip=172.31.29.217) return self.execute_command('GET', name)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/client.py", line 898, in execute_command
(raylet, ip=172.31.29.217) conn = self.connection or pool.get_connection(command_name, **options)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 1192, in get_connection
(raylet, ip=172.31.29.217) connection.connect()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 567, in connect
(raylet, ip=172.31.29.217) self.on_connect()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 643, in on_connect
(raylet, ip=172.31.29.217) auth_response = self.read_response()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 739, in read_response
(raylet, ip=172.31.29.217) response = self._parser.read_response()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 484, in read_response
(raylet, ip=172.31.29.217) raise response
(raylet, ip=172.31.29.217) redis.exceptions.ConnectionError: max number of clients reached
(raylet, ip=172.31.29.217) Traceback (most recent call last):
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/workers/default_worker.py", line 186, in <module>
(raylet, ip=172.31.29.217) connect_only=True)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/node.py", line 164, in __init__
(raylet, ip=172.31.29.217) session_name = _get_with_retry(redis_client, "session_name")
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ray/node.py", line 41, in _get_with_retry
(raylet, ip=172.31.29.217) result = redis_client.get(key)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/client.py", line 1606, in get
(raylet, ip=172.31.29.217) return self.execute_command('GET', name)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/client.py", line 898, in execute_command
(raylet, ip=172.31.29.217) conn = self.connection or pool.get_connection(command_name, **options)
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 1192, in get_connection
(raylet, ip=172.31.29.217) connection.connect()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 567, in connect
(raylet, ip=172.31.29.217) self.on_connect()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 643, in on_connect
(raylet, ip=172.31.29.217) auth_response = self.read_response()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 739, in read_response
(raylet, ip=172.31.29.217) response = self._parser.read_response()
(raylet, ip=172.31.29.217) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/redis/connection.py", line 484, in read_response
(raylet, ip=172.31.29.217) raise response
(raylet, ip=172.31.29.217) redis.exceptions.ConnectionError: max number of clients reached
Wheel: https://s3-us-west-2.amazonaws.com/ray-wheels/releases/1.3.0/cb3661e547662f309a0cc55c5495b3adb779a309/ray-1.3.0-cp37-cp37m-manylinux2014_x86_64.whl
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 59 (59 by maintainers)
@amogkam can you set it to 1 so that you can continue running the release test? We shouldn’t commit a change that sets it to 1 though. We should just fix the bug.
WTH, this should not happen, CC @DmitriGekhtman , can you please help investigate this (this is a release blocker)?
Running this now with updated cluster config
Ok the node scalability test ended up failing
@wuisawesome
I’m running it again rn
No I’m replacing it with the 1.3 wheels.
@rkooo567 yep I’m trying that now