dvc: dvc.api.open: fails if streaming a .h5 file (dvc 2.1)

Description

Streaming the identical model in h5py format works in dvc version 1.11.16, but raises the following exception in dvc version 2.1

Traceback (most recent call last): File "h5py/h5fd.pyx", line 155, in h5py.h5fd.H5FD_fileobj_read SystemError: <built-in method flush of _io.BytesIO object at 0x7f82676084f0> returned a result with an error set

code snippet:

with dvc.api.open('data/saved_model/model.h5',
                      repo=config_env['REPO'],
                      rev=config_env['REV'],
                      remote=config_env['REMOTE'],
                      mode='rb') as model_file:
        h5_fileobject = h5py.File(model_file, 'r')
        model = tf.keras.models.load_model(h5_fileobject)

Output of dvc doctor:

DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.8.6 on Linux-5.4.0-66-generic-x86_64-with-glibc2.29
Supports: http, https, webdav, webdavs

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Thank you very much.

Invoking the 1.11.x behavior is successful. Additionally we planned to support the Ranges header in the future.

Thanks for the update. I updated to the lastest master including webdav4, now the issue is a different one. Our Server does not support ranges, which seem to be required.

File "h5py/h5fd.pyx", line 150, in h5py.h5fd.H5FD_fileobj_get_eof
  File "h5py/h5fd.pyx", line 150, in h5py.h5fd.H5FD_fileobj_get_eof
  File "/.../.venv/lib/python3.8/site-packages/webdav4/fsspec.py", line 409, in seek
    return self.reader.seek(loc, whence=whence)
  File "/.../.venv/lib/python3.8/site-packages/webdav4/stream.py", line 195, in seek
    raise ValueError("server does not support ranges")
ValueError: server does not support ranges
DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.8.6 on Linux-5.4.0-73-generic-x86_64-with-glibc2.29
Supports: hdfs, http, https, webdav, webdavs
Cache types: reflink, hardlink, symlink
Cache directory: btrfs on /dev/nvme0n1p1
Caches: local
Remotes: webdav, webdav, webdav, webdav
Workspace directory: btrfs on /dev/nvme0n1p1
Repo: dvc, git