ray: [ray] Objects are being evicted improperly
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): local machines
- Ray installed from (source or binary): pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev4-cp37-cp37m-manylinux1_x86_64.whl
- Ray version: 0.8.0.dev4
- Python version: 3.7
- Exact command to reproduce:
import ray
import torch
ray.init(object_store_memory=int(100e6))
@ray.remote
def identity(vectors):
return [ray.put(ray.get(vec)) for vec in vectors]
obj_id = ray.put(torch.randn(int(1e5)))
vectors = [obj_id for _ in range(200)]
while True:
vectors = ray.get(identity.remote(vectors))
Describe the problem
This code throws the following error.
2019-09-03 13:46:22,598 WARNING worker.py:1797 -- The task with ID ffffffffffffffffffff01000000 is a driver task and so the object created by ray.put could not be reconstructed.
(pid=38222) 2019-09-03 13:46:23,308 INFO worker.py:432 -- The object with ID ObjectID(7d58f415c89effffffff0100000000c001000000) already exists in the object store.
2019-09-03 13:46:28,320 ERROR worker.py:1737 -- Possible unhandled error from worker: ray_worker (pid=38222, host=atlas)
ray.exceptions.UnreconstructableError: Object ffffffffffffffffffff01000000008002000000 is lost (either LRU evicted or deleted by user) and cannot be reconstructed. Try increasing the object store memory available with ray.init(object_store_memory=<bytes>) or setting object store limits with ray.remote(object_store_memory=<bytes>). See also: https://ray.readthe
docs.io/en/latest/memory-management.html
However, if you replace the definition of obj_id with
obj_id = ray.put(list(range(int(1e5)))) then we get the correct error which @ericl 's recent PR added, or if you replace the definition with obj_id = torch.randn(int(1e5)):
(pid=46751) 2019-09-03 13:56:21,919 INFO worker.py:2381 -- Put failed since the value was either too large or the store was full of pinned objects. If you are putting and holding references to a lot of object ids, consider ray.put(value, weakref=True) to allow object data to be evicted early.
However, neither error should be raised -we have only 80 MB of objects and the object store has 100 MB capacity.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (5 by maintainers)
@ericl it’s not just drivers that I can pin the objects in, right -I could pin an object in an Actor?
Ok great, in this case I would recommend you switch to using actors. Actors are long-lived, so if you pin an object in the actor it will stay there for-ever as long as the actor is holding a reference to it.