ray: [Core] getting/creating an actor from multiple thread errors
What happened + What you expected to happen
Creating/getting an actor from multiple threads like this:
import ray
import threading
import time
import random
@ray.remote
class bar:
pass
def foo():
time.sleep(random.random())
bar.options(name="bar", namespace="bar_name", get_if_exists=True, lifetime="detached").remote()
threads = []
for i in range(1000):
threads.append(threading.Thread(target=foo))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
sometimes results an error:
Traceback (most recent call last):
File "/Users/andrewxue/anaconda3/envs/ray/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/andrewxue/anaconda3/envs/ray/lib/python3.9/threading.py", line 917, in run
return ray.get_actor(name, namespace=namespace)
File "/Users/andrewxue/fork/ray/python/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
self._target(*self._args, **self._kwargs)
File "/Users/andrewxue/fork/ray/python/ray/data/tests/test.py", line 13, in foo
bar.options(name="bar", namespace="bar_name", get_if_exists=True, lifetime="detached").remote()
File "/Users/andrewxue/fork/ray/python/ray/actor.py", line 687, in remote
return fn(*args, **kwargs)
File "/Users/andrewxue/fork/ray/python/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/Users/andrewxue/fork/ray/python/ray/_private/worker.py", line 2845, in get_actor
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
File "/Users/andrewxue/fork/ray/python/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return worker.core_worker.get_named_actor_handle(name, namespace or "")
File "python/ray/_raylet.pyx", line 4021, in ray._raylet.CoreWorker.get_named_actor_handle
return fn(*args, **kwargs)
File "/Users/andrewxue/fork/ray/python/ray/util/tracing/tracing_helper.py", line 388, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/Users/andrewxue/fork/ray/python/ray/actor.py", line 781, in _remote
return ray.get_actor(name, namespace=namespace)
File "/Users/andrewxue/fork/ray/python/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
File "python/ray/_raylet.pyx", line 453, in ray._raylet.check_status
return fn(*args, **kwargs)
File "/Users/andrewxue/fork/ray/python/ray/_private/client_mode_hook.py", line 103, in wrapper
ValueError: Failed to look up actor with name 'bar'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.
Versions / Dependencies
latest master
Reproduction script
script given above
Issue Severity
Medium: It is a significant difficulty but I can work around it.
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 16 (15 by maintainers)
Commits related to this issue
- [data] fix flaky stats manager test (#41299) Concurrently creating or getting an actor can cause errors (#41324). This pr puts a lock on `_get_or_create_stats_actor`. Also catches any exceptions r... — committed to ray-project/ray by Zandew 7 months ago
- [data] fix flaky stats manager test (#41299) Concurrently creating or getting an actor can cause errors (#41324). This pr puts a lock on `_get_or_create_stats_actor`. Also catches any exceptions r... — committed to ujjawal-khare-27/ray by Zandew 7 months ago
I was able to reproduce the issue. Looking further.
@anyscalesam I think this is the one: https://github.com/ray-project/ray/issues/44083
Btw, it’d be nicer to think about how to safely enable thread-safety for APIs. right now, my impression is it is kind of happened to work (and prone to be broken).
@jobh Thanks for reporting. Can you open a separate issue and link this one? We can combine them when investigations provide more evidence they are the same.
discussed internal to core - we’ll possibly plan for this in ray210