ray: [Bug] modin-on-ray's unwrap_partitions is 100x slower on Mac than on Windows
Search before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
I was doing some development on Modin and wanted to call unwrap_partitions, which uses Ray to materialize the partitions constituting a Modin dataframe. I found that the function call took 122 seconds on my Macbook, but 721 milliseconds on a Windows computer.
Versions / Dependencies
Before running the reproduction script, install Modin with
pip install modin
and download this file.
My Mac is on:
- MacBook Pro (16-inch, 2019), macOS Big Sur 11.5.2
- Python 3.8.8
- Ray 1.8.0
- Modin 0.11.1+37.g0a3acc15
- RAM: 16 GB 2667 MHz DDR4
ray.cluster_resources(): {‘CPU’: 16.0, ‘object_store_memory’: 3248034201.0, ‘memory’: 6496068404.0, ‘node:127.0.0.1’: 1.0}
The Windows computer is:
- Windows 10 machine with 10 cores
- Python 3.8.5
- Ray 1.7.1
- Modin 0.11.1+45.g41213581
Reproduction script
from modin.distributed.dataframe.pandas import unwrap_partitions import modin.pandas as pd import ray from modin.config import NPartitions fdf = pd.read_csv(“test_700kx256.csv”) %time ray.wait(unwrap_partitions(fdf, axis=0), num_returns=NPartitions.get())
Anything else
Some other Modin operations have been drastically slower on my Mac than on the same Windows machine. I have observed the same problem with other Macs. I have tried to provide a simple, reproducible example here.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 64 (49 by maintainers)
Haha wow, I didn’t realize this was actually faster on Windows! First time something is faster on Windows I think 😃
@devin-petersohn @mvashishtha are y’all able to reproduce this without a dataset? Or really ideally, simulate modin’s behavior here with just the ray core?
cc @scv119 @rkooo567
after some reading, you i think you might be able to use
vmstatto monitor the swap out pages during the process; this is the best indication of trashing.If it indeed is thrashing, then seems we should set a limit on the object store size that can be allocated to /tmp by default. In Linux, we already raise an error if the /tmp object store is configured to be >10GB. However on other platforms no error is raised: https://github.com/ray-project/ray/blob/master/python/ray/_private/services.py#L1851
Run the same scripts on my Mac got following result:
@wuisawesome it seems like his workload is slow without object spilling (whereas it is faster in your machine). Isn’t there possibility the issue is where plasma object is stored? (In mac, it is /tmp, and maybe his tmp is slow, and in windows, it probably uses shm)
Thanks for following up on this @devin-petersohn and thanks for the repro, we will prioritize it!