cudf: [BUG] distributed dask_cudf.read_parquet much slower in ipython/jupyter vs python

Describe the bug When doing a dask_cudf.read_parquet on a LocalCUDACluster I see a significant performance difference when running the same code in a ipython/jupyter session vs running it as a standalone python script.

From the profiles it seems like most of the delta is actually coming from time taken to deserialize in both instances.

Steps/Code to reproduce bug

from dask_cuda import LocalCUDACluster
from distributed import Client, wait, performance_report
import dask_cudf
import dask
import time

write_data = False
perf_report_fname = "py-perf.html"

if __name__ == "__main__":
    cluster = LocalCUDACluster()
    client = Client(cluster)

    if write_data:
        dask.datasets.timeseries(start="2022-01-01", end="2024-01-01").to_parquet("test_data.parquet")


    with performance_report(filename=perf_report_fname):
        ddf = dask_cudf.read_parquet("test_data.parquet")
        t1 = time.time()
        ddf = ddf.persist()
        wait(ddf)
        print(time.time() - t1)

This takes about ~12s in ipython vs ~4s as a standalone script

Expected behavior Comparable perf in both scenarios.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda 22.02 nightly

Environment details python version: 3.8 ipython version: 8.0.1

Additional context

  1. Attaching the performance report from both runs which shows similar compute but wildly different task deserialization time (visible in the task graph as well). perf_reports.zip

  2. Happened to observe this on dask_cudf.read_parquet but might be impacting other apis as well.

  3. Doesn’t seem to be an issue when using dd.read_parquet with the same cluster

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (23 by maintainers)

Most upvoted comments

@pentschev quickly wrote this PR in dask-cuda https://github.com/rapidsai/dask-cuda/pull/854 which allows users to define pre-imports like the following:

cluster = LocalCUDACluster(pre_import=[“cudf”])

My testing showed this resolved the issue. @ayushdg can you confirm ?