dask: `test_blockwise_dataframe_io[True-False-hdf]` is flaky on osx CI
I noticed this test periodically failing due to a pytables locking issue today (may have started earlier).
Traceback
_________________ test_blockwise_dataframe_io[True-False-hdf] __________________
[gw0] darwin -- Python 3.8.12 /Users/runner/miniconda3/envs/test-environment/bin/python
c = <Client: 'tcp://127.0.0.1:50922' processes=2 threads=2, memory=28.00 GiB>
tmpdir = local('/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_blockwise_dataframe_io_Tr3')
io = 'hdf', fuse = False, from_futures = True
@pytest.mark.filterwarnings(
"ignore:Running on a single-machine scheduler when a distributed client "
"is active might lead to unexpected results."
)
@pytest.mark.parametrize(
"io",
["parquet-pyarrow", "parquet-fastparquet", "csv", "hdf"],
)
@pytest.mark.parametrize("fuse", [True, False, None])
@pytest.mark.parametrize("from_futures", [True, False])
def test_blockwise_dataframe_io(c, tmpdir, io, fuse, from_futures):
pd = pytest.importorskip("pandas")
dd = pytest.importorskip("dask.dataframe")
df = pd.DataFrame({"x": [1, 2, 3] * 5, "y": range(15)})
if from_futures:
parts = [df.iloc[:5], df.iloc[5:10], df.iloc[10:15]]
futs = c.scatter(parts)
ddf0 = dd.from_delayed(futs, meta=parts[0])
else:
ddf0 = dd.from_pandas(df, npartitions=3)
if io.startswith("parquet"):
if io == "parquet-pyarrow":
pytest.importorskip("pyarrow.parquet")
engine = "pyarrow"
else:
pytest.importorskip("fastparquet")
engine = "fastparquet"
ddf0.to_parquet(str(tmpdir), engine=engine)
ddf = dd.read_parquet(str(tmpdir), engine=engine)
elif io == "csv":
ddf0.to_csv(str(tmpdir), index=False)
ddf = dd.read_csv(os.path.join(str(tmpdir), "*"))
elif io == "hdf":
pytest.importorskip("tables")
fn = str(tmpdir.join("h5"))
> ddf0.to_hdf(fn, "/data*")
dask/tests/test_distributed.py:370:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/core.py:1620: in to_hdf
return to_hdf(self, path_or_buf, key, mode, append, **kwargs)
dask/dataframe/io/hdf.py:251: in to_hdf
compute_as_if_collection(
dask/base.py:319: in compute_as_if_collection
return schedule(dsk2, keys, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:2167: in gather
return self.sync(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:311: in sync
return sync(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:378: in sync
raise exc.with_traceback(tb)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/utils.py:351: in f
result = yield future
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tornado/gen.py:762: in run
value = future.result()
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
dask/dataframe/io/hdf.py:27: in _pd_to_hdf
pd_to_hdf(*args, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/core/generic.py:2606: in to_hdf
pytables.to_hdf(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:277: in to_hdf
with HDFStore(
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:561: in __init__
self.open(mode=mode, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/pandas/io/pytables.py:710: in open
self._handle = tables.open_file(self._path, self._mode, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tables/file.py:300: in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
../../../miniconda3/envs/test-environment/lib/python3.8/site-packages/tables/file.py:750: in __init__
self._g_new(filename, mode, **params)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E tables.exceptions.HDF5ExtError: HDF5 error back trace
E
E File "H5F.c", line 620, in H5Fopen
E unable to open file
E File "H5VLcallback.c", line 3502, in H5VL_file_open
E failed to iterate over available VOL connector plugins
E File "H5PLpath.c", line 579, in H5PL__path_table_iterate
E can't iterate over plugins in plugin path '(null)'
E File "H5PLpath.c", line 620, in H5PL__path_table_iterate_process_path
E can't open directory: /Users/runner/miniconda3/envs/test-environment/lib/hdf5/plugin
E File "H5VLcallback.c", line 3351, in H5VL__file_open
E open failed
E File "H5VLnative_file.c", line 97, in H5VL__native_file_open
E unable to open file
E File "H5Fint.c", line 1898, in H5F_open
E unable to lock the file
E File "H5FD.c", line 1625, in H5FD_lock
E driver lock request failed
E File "H5FDsec2.c", line 1002, in H5FD__sec2_lock
E unable to lock file, errno = 35, error message = 'Resource temporarily unavailable'
E
E End of HDF5 error back trace
E
E Unable to open/create file '/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_blockwise_dataframe_io_Tr3/h5'
tables/hdf5extension.pyx:486: HDF5ExtError
Log from failing test: https://github.com/dask/dask/runs/5558289646?check_suite_focus=true#step:6:21912
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 15 (9 by maintainers)
I’ll keep an eye on it tomorrow, and if things seem better by EOD, I can close it then
Yes, I was about to open a PR testing that. I’m concerned that it came up at all, though, in the context of using a dask lock. It seems that the filesystem locking has a different perspective on things.
We can add that environment variable and say “we trust our lock more”, of course, but I was hoping to understand what’s going on a bit more.
I took a look at the last month of CI runs on
main, and this has happened twice in the last month: https://github.com/dask/dask/runs/7291695922?check_suite_focus=true https://github.com/dask/dask/runs/7043653348?check_suite_focus=trueThere was also an hdf-related segfault a few days ago which may be related: https://github.com/dask/dask/runs/7364095757?check_suite_focus=true
This is mostly a note to myself that this is still happening (though with some reduced frequency?)
Fair enough – I have opened #9154, if that seems to improve things, I’m happy to move on