cudf: [BUG] distributed dask_cudf.read_parquet much slower in ipython/jupyter vs python
Describe the bug
When doing a dask_cudf.read_parquet on a LocalCUDACluster I see a significant performance difference when running the same code in a ipython/jupyter session vs running it as a standalone python script.
From the profiles it seems like most of the delta is actually coming from time taken to deserialize in both instances.
Steps/Code to reproduce bug
from dask_cuda import LocalCUDACluster
from distributed import Client, wait, performance_report
import dask_cudf
import dask
import time
write_data = False
perf_report_fname = "py-perf.html"
if __name__ == "__main__":
cluster = LocalCUDACluster()
client = Client(cluster)
if write_data:
dask.datasets.timeseries(start="2022-01-01", end="2024-01-01").to_parquet("test_data.parquet")
with performance_report(filename=perf_report_fname):
ddf = dask_cudf.read_parquet("test_data.parquet")
t1 = time.time()
ddf = ddf.persist()
wait(ddf)
print(time.time() - t1)
This takes about ~12s in ipython vs ~4s as a standalone script
Expected behavior Comparable perf in both scenarios.
Environment overview (please complete the following information)
- Environment location: Bare-metal
- Method of cuDF install: conda 22.02 nightly
Environment details python version: 3.8 ipython version: 8.0.1
Additional context
-
Attaching the performance report from both runs which shows similar compute but wildly different task deserialization time (visible in the task graph as well). perf_reports.zip
-
Happened to observe this on
dask_cudf.read_parquetbut might be impacting other apis as well. -
Doesn’t seem to be an issue when using
dd.read_parquetwith the same cluster
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (23 by maintainers)
@pentschev quickly wrote this PR in dask-cuda https://github.com/rapidsai/dask-cuda/pull/854 which allows users to define pre-imports like the following:
My testing showed this resolved the issue. @ayushdg can you confirm ?