ray: [Bug] modin-on-ray's unwrap_partitions is 100x slower on Mac than on Windows

Search before asking

I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

I was doing some development on Modin and wanted to call unwrap_partitions, which uses Ray to materialize the partitions constituting a Modin dataframe. I found that the function call took 122 seconds on my Macbook, but 721 milliseconds on a Windows computer.

Versions / Dependencies

Before running the reproduction script, install Modin with

pip install modin

and download this file.

My Mac is on:

MacBook Pro (16-inch, 2019), macOS Big Sur 11.5.2
Python 3.8.8
Ray 1.8.0
Modin 0.11.1+37.g0a3acc15
RAM: 16 GB 2667 MHz DDR4
ray.cluster_resources(): {‘CPU’: 16.0, ‘object_store_memory’: 3248034201.0, ‘memory’: 6496068404.0, ‘node:127.0.0.1’: 1.0}

The Windows computer is:

Windows 10 machine with 10 cores
Python 3.8.5
Ray 1.7.1
Modin 0.11.1+45.g41213581

Reproduction script

from modin.distributed.dataframe.pandas import unwrap_partitions import modin.pandas as pd import ray from modin.config import NPartitions fdf = pd.read_csv(“test_700kx256.csv”) %time ray.wait(unwrap_partitions(fdf, axis=0), num_returns=NPartitions.get())

Anything else

Some other Modin operations have been drastically slower on my Mac than on the same Windows machine. I have observed the same problem with other Macs. I have tried to provide a simple, reproducible example here.

Are you willing to submit a PR?

Yes I am willing to submit a PR!

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 64 (49 by maintainers)

Most upvoted comments

Haha wow, I didn’t realize this was actually faster on Windows! First time something is faster on Windows I think 😃

pcmoritz on Nov 23, 2021

@devin-petersohn @mvashishtha are y’all able to reproduce this without a dataset? Or really ideally, simulate modin’s behavior here with just the ray core?

cc @scv119 @rkooo567

wuisawesome on Nov 19, 2021

after some reading, you i think you might be able to use vmstat to monitor the swap out pages during the process; this is the best indication of trashing.

scv119 on Dec 8, 2021

If it indeed is thrashing, then seems we should set a limit on the object store size that can be allocated to /tmp by default. In Linux, we already raise an error if the /tmp object store is configured to be >10GB. However on other platforms no error is raised: https://github.com/ray-project/ray/blob/master/python/ray/_private/services.py#L1851

ericl on Dec 8, 2021

Run the same scripts on my Mac got following result:

start
2021-12-07 18:25:41,116	INFO services.py:1272 -- View the Ray dashboard at http://127.0.0.1:8265
2021-12-07 18:25:41,118	INFO services.py:1769 -- object_store_memory is not verified when plasma_directory is set.
done 3.896249771118164
4.212249994277954
(f pid=98552) ObjectRef(623b26bdd75b28e9ffffffffffffffffffffffff0100000002000000)
(f pid=98552) took  2.701543092727661
(f pid=98547) ObjectRef(69a6825d641b4613ffffffffffffffffffffffff0100000002000000)
(f pid=98547) took  2.770854949951172
(f pid=98544) ObjectRef(63964fa4841d4a2effffffffffffffffffffffff0100000002000000)
(f pid=98544) took  2.8403570652008057
(f pid=98554) ObjectRef(480a853c2c4c6f27ffffffffffffffffffffffff0100000002000000)
(f pid=98554) took  2.803499698638916
(f pid=98551) ObjectRef(32cccd03c567a254ffffffffffffffffffffffff0100000002000000)
(f pid=98551) took  2.7818920612335205
(f pid=98553) ObjectRef(ee4e90da584ab0ebffffffffffffffffffffffff0100000002000000)
(f pid=98553) took  2.8101229667663574
3.543503999710083
(f pid=98548) ObjectRef(4ee449587774c1f0ffffffffffffffffffffffff0100000002000000)
(f pid=98548) took  2.8532378673553467
(f pid=98543) ObjectRef(a67dc375e60ddd1affffffffffffffffffffffff0100000002000000)
(f pid=98543) took  2.8480982780456543

scv119 on Dec 8, 2021

@wuisawesome it seems like his workload is slow without object spilling (whereas it is faster in your machine). Isn’t there possibility the issue is where plasma object is stored? (In mac, it is /tmp, and maybe his tmp is slow, and in windows, it probably uses shm)

rkooo567 on Nov 24, 2021

Thanks for following up on this @devin-petersohn and thanks for the repro, we will prioritize it!

pcmoritz on Nov 23, 2021