scanpy: sc.read_h5ad randomly produces AnnDataReadError/OSError
I am trying to load some datasets with sc.read_h5ad(file_name)
. Frequently, I get the below error. When I re-run the code multiple times or at different times it sometimes works, but often I get the error (using the same code and data). This happens when reading different h5ad datasets (e.g. is not specific to one dataset). At all times there seems to be enough free RAM / similar amount of free RAM. This happens both when using jupyter-notebook and python without jn.
Error:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
155 try:
--> 156 return func(elem, *args, **kwargs)
157 except Exception as e:
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_group(group)
505 if "h5sparse_format" in group.attrs: # Backwards compat
--> 506 return SparseDataset(group).to_memory()
507
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_core/sparse_dataset.py in to_memory(self)
370 mtx = format_class(self.shape, dtype=self.dtype)
--> 371 mtx.data = self.group["data"][...]
372 mtx.indices = self.group["indices"][...]
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
572 fspace = selection.id
--> 573 self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
574
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5d.pyx in h5py.h5d.DatasetID.read()
h5py/_proxy.pyx in h5py._proxy.dset_rw()
h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()
OSError: Can't read data (file read failed: time = Sat Aug 1 13:27:54 2020
, filename = '/path.../filtered_gene_bc_matrices.h5ad', file descriptor = 47, errno = 5, error message = 'Input/output error', buf = 0x55ec782e9031, total read size = 7011, bytes this sub-read = 7011, bytes actually read = 18446744073709551615, offset = 0)
During handling of the above exception, another exception occurred:
AnnDataReadError Traceback (most recent call last)
<ipython-input-14-faac769583f8> in <module>
17 #while True:
18 #try:
---> 19 adatas.append(sc.read_h5ad(file))
20 file_diffs.append('_'.join([file.split('/')[i] for i in diff_path_idx]))
21 #break
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
411 d[k] = read_dataframe(f[k])
412 else: # Base case
--> 413 d[k] = read_attribute(f[k])
414
415 d["raw"] = _read_raw(f, as_sparse, rdasp)
~/miniconda3/envs/rpy2_3/lib/python3.8/functools.py in wrapper(*args, **kw)
873 '1 positional argument')
874
--> 875 return dispatch(args[0].__class__)(*args, **kw)
876
877 funcname = getattr(func, '__name__', 'singledispatch function')
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
160 else:
161 parent = _get_parent(elem)
--> 162 raise AnnDataReadError(
163 f"Above error raised while reading key {elem.name!r} of "
164 f"type {type(elem)} from {parent}."
AnnDataReadError: Above error raised while reading key '/X' of type <class 'h5py._hl.group.Group'> from /.
Versions:
scanpy==1.5.1 anndata==0.7.4 umap==0.4.6 numpy==1.18.5 scipy==1.4.1 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.8.2 louvain==0.6.1 leidenalg==0.8.1
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 25 (11 by maintainers)
I’m pretty sure none of you are having the same issue as the original one reported here. Compare @abuchin 's error message of
KeyError: 'dict'
to the original poster’s error ofOSError: Can't read data
.The thing you’re seeing is a new one stemming from an update to anndata. You’re trying to read in a
h5ad
file created with a newer version of the package with your older one. I think the cutoff point is 0.8.0 but I could be mistaken.Upgrade your anndata and you should be ok.
Found the same error in our internal workflows. Saved the data to h5py files, but could not open them anymore for some reason.
Error:
Versions
Package Version
absl-py 1.1.0 aiohttp 3.8.1 aiosignal 1.2.0 anndata 0.7.5 anndata2ri 1.0.6 annoy 1.17.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asn1crypto 1.4.0 async-timeout 4.0.2 asynctest 0.13.0 attrs 20.3.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.0 boto3 1.17.66 botocore 1.20.66 brotlipy 0.7.0 cached-property 1.5.2 cachetools 5.2.0 certifi 2020.12.5 cffi 1.14.5 chardet 4.0.0 charset-normalizer 2.0.12 chex 0.1.3 click 8.1.3 colormath 3.0.0 commonmark 0.9.1 conda 4.6.14 conda-package-handling 1.7.3 cryptography 3.4.7 cycler 0.10.0 Cython 0.29.30 decorator 5.0.7 defusedxml 0.7.1 dill 0.3.3 dm-tree 0.1.7 docrep 0.3.2 entrypoints 0.4 et-xmlfile 1.1.0 fa2 0.3.5 fastjsonschema 2.15.3 flatbuffers 2.0 flax 0.5.0 frozenlist 1.3.0 fsspec 2022.5.0 future 0.18.2 get-version 2.2 google-auth 2.6.6 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.46.3 h5py 3.2.1 idna 2.10 imageio 2.19.3 importlib-metadata 4.11.4 importlib-resources 5.7.1 ipykernel 5.5.4 ipython 7.23.1 ipython-genutils 0.2.0 ipywidgets 7.7.0 jax 0.3.13 jaxlib 0.3.10 jedi 0.18.0 Jinja2 3.1.2 jmespath 0.10.0 joblib 1.0.1 jsonschema 4.6.0 jupyter-client 6.1.12 jupyter-core 4.7.1 jupyterlab-pygments 0.2.2 jupyterlab-widgets 1.1.0 kiwisolver 1.3.1 legacy-api-wrap 1.2 leidenalg 0.8.4 llvmlite 0.35.0 loompy 3.0.7 louvain 0.7.0 Markdown 3.3.7 MarkupSafe 2.1.1 matplotlib 3.4.1 matplotlib-inline 0.1.2 mistune 0.8.4 msgpack 1.0.4 multidict 6.0.2 multipledispatch 0.6.0 multiprocess 0.70.11.1 natsort 7.1.1 nbclient 0.6.4 nbconvert 6.5.0 nbformat 5.4.0 nest-asyncio 1.5.5 networkx 2.5 notebook 6.4.11 numba 0.52.0 numexpr 2.7.3 numpy 1.19.5 numpy-groupies 0.9.17 numpyro 0.9.2 oauthlib 3.2.0 openpyxl 3.0.10 opt-einsum 3.3.0 optax 0.1.2 packaging 20.9 pandas 1.2.0 pandocfilters 1.5.0 parso 0.8.2 pathos 0.2.7 patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.1.1 pip 21.1.1 pox 0.2.9 ppft 1.6.6.3 prometheus-client 0.14.1 prompt-toolkit 3.0.18 protobuf 3.19.0 protobuf3-to-dict 0.1.5 ptyprocess 0.7.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycosat 0.6.3 pycparser 2.20 pyDeprecate 0.3.1 Pygments 2.9.0 pyOpenSSL 20.0.1 pyparsing 2.4.7 pyro-api 0.1.2 pyro-ppl 1.8.1 pyrsistent 0.18.1 PySocks 1.7.1 python-dateutil 2.8.1 python-igraph 0.9.1 pytorch-lightning 1.5.10 pytz 2021.1 PyWavelets 1.3.0 PyYAML 6.0 pyzmq 22.0.3 requests 2.25.1 requests-oauthlib 1.3.1 rich 12.4.4 rpy2 3.4.2 rsa 4.8 ruamel-yaml-conda 0.15.80 ruamel.yaml 0.17.21 ruamel.yaml.clib 0.2.6 s3transfer 0.4.2 sagemaker 2.39.0.post0 scanpy 1.6.1 scikit-image 0.19.2 scikit-learn 0.24.2 scikit-misc 0.1.4 scipy 1.6.0 scrublet 0.2.3 scvi-tools 0.16.2 seaborn 0.11.1 Send2Trash 1.8.0 setuptools 59.5.0 setuptools-scm 6.0.1 sinfo 0.3.1 six 1.15.0 smdebug-rulesconfig 1.0.1 soupsieve 2.3.2.post1 spectra 0.0.11 statsmodels 0.12.2 stdlib-list 0.8.0 tables 3.6.1 tensorboard 2.9.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 terminado 0.15.0 texttable 1.6.3 threadpoolctl 2.1.0 tifffile 2021.11.2 tinycss2 1.1.1 toolz 0.11.2 torch 1.11.0 torchmetrics 0.9.0 tornado 6.1 tqdm 4.60.0 traitlets 5.2.2.post1 typing-extensions 4.2.0 tzlocal 2.1 umap-learn 0.4.6 urllib3 1.26.4 wcwidth 0.2.5 webencodings 0.5.1 Werkzeug 2.1.2 wheel 0.36.2 widgetsnbextension 3.6.0 yarl 1.7.2 zipp 3.4.1 Note: you may need to restart the kernel to use updated packages."
Has anyone found any solution to work around this issue?
Great to hear! Usually when there’s weird, site-specific errors, I say I can’t help because I don’t have SSH access and “my crystal ball is currently out of order”.
Seems like my crystal ball worked just fine these days!
From my time in @theislab I infer this means it’s a network mount problem.
You can probably fix it by putting the file somewhere in the local file system then. Since /home/* is network-mounted, that means /localscratch/ or /tmp/ I assume
I am having the same problem,however pip install anndata --upgrade didn’t work for me. pip said it is already the latest version: Requirement already satisfied: anndata in d:\python3.10.9\lib\site-packages (0.9.1), then I really don’t know what to do. Could you guys help me with that? [crying]