dask: `test_blockwise_dataframe_io[True-False-hdf]` is flaky on osx CI

I noticed this test periodically failing due to a pytables locking issue today (may have started earlier).

Traceback
_________________ test_blockwise_dataframe_io[True-False-hdf] __________________
[gw0] darwin -- Python 3.8.12 /Users/runner/miniconda3/envs/test-environment/bin/python

c = <Client: 'tcp://127.0.0.1:50922' processes=2 threads=2, memory=28.00 GiB>
tmpdir = local('/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_blockwise_dataframe_io_Tr3')
io = 'hdf', fuse = False, from_futures = True

    @pytest.mark.filterwarnings(
        "ignore:Running on a single-machine scheduler when a distributed client "
        "is active might lead to unexpected results."
    )
    @pytest.mark.parametrize(
        "io",
        ["parquet-pyarrow", "parquet-fastparquet", "csv", "hdf"],
    )
    @pytest.mark.parametrize("fuse", [True, False, None])
    @pytest.mark.parametrize("from_futures", [True, False])
    def test_blockwise_dataframe_io(c, tmpdir, io, fuse, from_futures):
        pd = pytest.importorskip("pandas")
        dd = pytest.importorskip("dask.dataframe")
    
        df = pd.DataFrame({"x": [1, 2, 3] * 5, "y": range(15)})
    
        if from_futures:
            parts = [df.iloc[:5], df.iloc[5:10], df.iloc[10:15]]
            futs = c.scatter(parts)
            ddf0 = dd.from_delayed(futs, meta=parts[0])
        else:
            ddf0 = dd.from_pandas(df, npartitions=3)
    
        if io.startswith("parquet"):
            if io == "parquet-pyarrow":
                pytest.importorskip("pyarrow.parquet")
                engine = "pyarrow"
            else:
                pytest.importorskip("fastparquet")
                engine = "fastparquet"
            ddf0.to_parquet(str(tmpdir), engine=engine)
            ddf = dd.read_parquet(str(tmpdir), engine=engine)
        elif io == "csv":
            ddf0.to_csv(str(tmpdir), index=False)
            ddf = dd.read_csv(os.path.join(str(tmpdir), "*"))
        elif io == "hdf":
            pytest.importorskip("tables")
            fn = str(tmpdir.join("h5"))
>           ddf0.to_hdf(fn, "/data*")

dask/tests/test_distributed.py:370: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dask/dataframe/core.py:1620: in to_hdf
    return to_hdf(self, path_or_buf, key, mode, append, **kwargs)
dask/dataframe/io/hdf.py:251: in to_hdf
    compute_as_if_collection(
dask/base.py:319: in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:3015: in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:2167: in gather
    return self.sync(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:311: in sync
    return sync(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:378: in sync
    raise exc.with_traceback(tb)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:351: in f
    result = yield future
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tornado/gen.py:762: in run
    value = future.result()
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:2030: in _gather
    raise exception.with_traceback(traceback)
dask/dataframe/io/hdf.py:27: in _pd_to_hdf
    pd_to_hdf(*args, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/core/generic.py:2606: in to_hdf
    pytables.to_hdf(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:277: in to_hdf
    with HDFStore(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:561: in __init__
    self.open(mode=mode, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:710: in open
    self._handle = tables.open_file(self._path, self._mode, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tables/file.py:300: in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tables/file.py:750: in __init__
    self._g_new(filename, mode, **params)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   tables.exceptions.HDF5ExtError: HDF5 error back trace
E   
E     File "H5F.c", line 620, in H5Fopen
E       unable to open file
E     File "H5VLcallback.c", line 3502, in H5VL_file_open
E       failed to iterate over available VOL connector plugins
E     File "H5PLpath.c", line 579, in H5PL__path_table_iterate
E       can't iterate over plugins in plugin path '(null)'
E     File "H5PLpath.c", line 620, in H5PL__path_table_iterate_process_path
E       can't open directory: /Users/runner/miniconda3/envs/test-environment/lib/hdf5/plugin
E     File "H5VLcallback.c", line 3351, in H5VL__file_open
E       open failed
E     File "H5VLnative_file.c", line 97, in H5VL__native_file_open
E       unable to open file
E     File "H5Fint.c", line 1898, in H5F_open
E       unable to lock the file
E     File "H5FD.c", line 1625, in H5FD_lock
E       driver lock request failed
E     File "H5FDsec2.c", line 1002, in H5FD__sec2_lock
E       unable to lock file, errno = 35, error message = 'Resource temporarily unavailable'
E   
E   End of HDF5 error back trace
E   
E   Unable to open/create file '/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_blockwise_dataframe_io_Tr3/h5'

tables/hdf5extension.pyx:486: HDF5ExtError

Log from failing test: https://github.com/dask/dask/runs/5558289646?check_suite_focus=true#step:6:21912

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

I’ll keep an eye on it tomorrow, and if things seem better by EOD, I can close it then

Yes, I was about to open a PR testing that. I’m concerned that it came up at all, though, in the context of using a dask lock. It seems that the filesystem locking has a different perspective on things.

We can add that environment variable and say “we trust our lock more”, of course, but I was hoping to understand what’s going on a bit more.

I took a look at the last month of CI runs on main, and this has happened twice in the last month: https://github.com/dask/dask/runs/7291695922?check_suite_focus=true https://github.com/dask/dask/runs/7043653348?check_suite_focus=true

There was also an hdf-related segfault a few days ago which may be related: https://github.com/dask/dask/runs/7364095757?check_suite_focus=true

This is mostly a note to myself that this is still happening (though with some reduced frequency?)

Fair enough – I have opened #9154, if that seems to improve things, I’m happy to move on