ray: [Ray 2.3 release] `dask_on_ray_1tb_sort (a954ab7)` OOM failure
OOM failure for dask_on_ray_1tb_sort (a954ab7)
BuildKite: https://buildkite.com/ray-project/release-tests-branch/builds/1332#01860fab-fa24-4968-b784-b80406311235 Cluster: https://console.anyscale.com/o/anyscale-internal/projects/prj_FKRmeV5pA6X72aVscFALNC32/clusters/ses_b5qgvrfp9n88fsqly46wjtiud7?command-history-section=command_history&drivers-section=interactive-sessions
(autoscaler +28s) Resized to 264 CPUs.
Traceback (most recent call last):
File "dask_on_ray/dask_on_ray_sort.py", line 230, in <module>
file_path=args.file_path,
File "dask_on_ray/dask_on_ray_sort.py", line 145, in trial
10, npartitions=-1
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/dataframe/core.py", line 1140, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/dataframe/core.py", line 1174, in _head
result = result.compute()
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/base.py", line 290, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/base.py", line 573, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 203, in ray_dask_get
result = ray_get_unpack(object_refs, progress_bar_actor=pb_actor)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 504, in ray_get_unpack
computed_result = get_result(object_refs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 491, in get_result
return ray.get(object_refs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/worker.py", line 2380, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::dask:('head-1000-10-sort_index-be01da623ff06afe98fde4f50eda005f', 0)() (pid=1909, ip=10.0.3.188)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::dask:('head-partial-10-sort_index-be01da623ff06afe98fde4f50eda005f', 808)() (pid=2278, ip=10.0.3.188)
At least one of the input arguments for this task could not be computed:
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 10.0.3.195, ID: 1fbfe866cadd55f7c78b1549a38263d90f3d9b5a2a01f6969c0d4344) where the task (task ID: b2361cda760e73ce485cb3b069cdda885538befc03000000, name=dask:('sort_index-be01da623ff06afe98fde4f50eda005f', 808), pid=466, memory used=12.27GB) was running was 209.96GB / 219.60GB (0.956091), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 9c704e9766df13b6da7d15dbda87aaa7630eba9dc2122a014d0891a2) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip 10.0.3.195`. To see the logs of the worker, use `ray logs worker-9c704e9766df13b6da7d15dbda87aaa7630eba9dc2122a014d0891a2*out -ip 10.0.3.195. Top 10
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (14 by maintainers)
Thanks @clarng
Actually we should keep this open until the commit is cherry picked
Bisecting is finished and https://github.com/ray-project/ray/pull/31976 is causing nightly test failure here.
Full bisecting runs are recorded in here.
Not sure if this is due to the nightly test itself, or a problem in the actual code.
cc @clarng and @scv119 - could you help take a look? Thanks.