anndata: H5ad write failing to cast types properly

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

Hi there, recently installed a fresh conda env as described on the scanpy page. Suddenly having the issue below. Minimal example provided. I saw old issues with a similar error but they appear to be closed and fixed. Thanks for your help.


import anndata
import scanpy as sc
import pandas as pd
import scvelo as scv
import numpy as np

sc.logging.print_versions()

X=np.array([[0,1,0],[0,1,0],[0,1,0]])
obs=pd.DataFrame([['red',1,0.22222],['blue',0,np.nan],['orange',1,0.1]])
var=pd.DataFrame([['yes',1,2],[np.nan,0,np.nan],['no',1.1,0.1]])
adata=anndata.AnnData(X=X,obs=obs,var=var)
adata.write('/wynton/home/ye/mschmitz1/test.h5ad')



(scanpy) [mschmitz1@dev2 misc]$ python TestH5write.py
... storing 0 as categorical
Traceback (most recent call last):
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 247, in write_dataframe
    col_names = [check_key(c) for c in df.columns]
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 247, in <listcomp>
    col_names = [check_key(c) for c in df.columns]
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py", line 109, in check_key
    raise TypeError(f"{key} of type {typ} is an invalid key. Should be str.")
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/wynton/home/ye/mschmitz1/code/macaque-dev-brain/misc/TestH5write.py", line 11, in <module>
    adata.write('/wynton/home/ye/mschmitz1/test.h5ad')
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1905, in write_h5ad
    _write_h5ad(
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
    write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/functools.py", line 877, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.


Versions

anndata 0.7.6 scanpy 1.8.1 sinfo 0.3.4

PIL 8.3.2 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 h5py 3.4.0 igraph 0.9.6 joblib 1.0.1 kiwisolver 1.3.2 leidenalg 0.8.7 llvmlite 0.37.0 matplotlib 3.4.3 mpl_toolkits NA natsort 7.1.1 numba 0.54.0 numexpr 2.7.3 numpy 1.20.3 packaging 21.0 pandas 1.3.3 pkg_resources NA pyexpat NA pyparsing 2.4.7 pytz 2021.1 scipy 1.5.3 scvelo 0.2.4 six 1.16.0 sklearn 1.0 tables 3.6.1 texttable 1.6.4 typing_extensions NA

################OUTPUT OF conda list packages in environment at /wynton/home/ye/mschmitz1/utils/miniconda3/envs/scanpy:

Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge anndata 0.7.6 pypi_0 pypi arpack 3.7.0 hc6cf775_2 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2021.5.30 ha878542_0 conda-forge certifi 2021.5.30 py39hf3d152e_0 conda-forge click 8.0.1 pypi_0 pypi cycler 0.10.0 py_2 conda-forge freetype 2.10.4 h0708190_1 conda-forge glpk 4.65 h9202a9a_1004 conda-forge gmp 6.2.1 h58526e2_0 conda-forge h5py 3.4.0 pypi_0 pypi hdf5 1.10.6 nompi_h3c11f04_101 conda-forge icu 68.1 h58526e2_0 conda-forge igraph 0.9.4 ha184e22_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge joblib 1.0.1 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge kiwisolver 1.3.2 py39h1a9c180_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge leidenalg 0.8.7 py39he80948d_0 conda-forge lerc 2.2.1 h9c3ff4c_0 conda-forge libblas 3.9.0 8_openblas conda-forge libcblas 3.9.0 8_openblas conda-forge libdeflate 1.7 h7f98852_5 conda-forge libffi 3.4.2 h9c3ff4c_4 conda-forge libgcc-ng 11.2.0 h1d223b6_9 conda-forge libgfortran-ng 7.5.0 h14aa051_19 conda-forge libgfortran4 7.5.0 h14aa051_19 conda-forge libgomp 11.2.0 h1d223b6_9 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 8_openblas conda-forge libllvm11 11.1.0 hf817b99_2 conda-forge libopenblas 0.3.12 pthreads_hb3c22a3_1 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_9 conda-forge libtiff 4.3.0 hf544144_1 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge llvmlite 0.37.0 py39h1bbdace_0 conda-forge loompy 3.0.6 pypi_0 pypi lz4-c 1.9.3 h9c3ff4c_1 conda-forge lzo 2.10 h516909a_1000 conda-forge matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge metis 5.1.0 h58526e2_1006 conda-forge mock 4.0.3 py39hf3d152e_1 conda-forge mpfr 4.1.0 h9202a9a_1 conda-forge natsort 7.1.1 pypi_0 pypi ncurses 6.2 h58526e2_4 conda-forge networkx 2.6.3 pypi_0 pypi numba 0.54.0 py39h56b8d98_0 conda-forge numexpr 2.7.3 py39hde0f152_0 conda-forge numpy 1.20.3 py39hdbf815f_1 conda-forge numpy-groupies 0.9.14 pypi_0 pypi olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 3.0.0 h7f98852_1 conda-forge packaging 21.0 pypi_0 pypi pandas 1.3.3 py39hde0f152_0 conda-forge patsy 0.5.2 pyhd8ed1ab_0 conda-forge pillow 8.3.2 py39ha612740_0 conda-forge pip 21.2.4 pyhd8ed1ab_0 conda-forge pynndescent 0.5.4 pypi_0 pypi pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pytables 3.6.1 py39hf6dc253_3 conda-forge python 3.9.7 hf930737_3_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-igraph 0.9.6 py39hfef886c_0 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2021.1 pyhd8ed1ab_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge scanpy 1.8.1 pypi_0 pypi scikit-learn 1.0 py39h7c5d8c9_1 conda-forge scipy 1.5.3 py39hf3f25e7_0 conda-forge scvelo 0.2.4 pypi_0 pypi seaborn 0.11.2 hd8ed1ab_0 conda-forge seaborn-base 0.11.2 pyhd8ed1ab_0 conda-forge setuptools 58.0.4 py39hf3d152e_2 conda-forge sinfo 0.3.4 pypi_0 pypi six 1.16.0 pyh6c4a22f_0 conda-forge sqlite 3.36.0 h9cd32fc_2 conda-forge statsmodels 0.13.0 py39hce5d2b2_0 conda-forge stdlib-list 0.8.0 pypi_0 pypi suitesparse 5.10.1 h9e50725_1 conda-forge tbb 2021.3.0 h4bd325d_0 conda-forge texttable 1.6.4 pyhd8ed1ab_0 conda-forge threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge tornado 6.1 py39h3811e60_1 conda-forge tqdm 4.62.3 pypi_0 pypi typing-extensions 3.10.0.2 pypi_0 pypi tzdata 2021b he74cb21_0 conda-forge umap-learn 0.5.1 pypi_0 pypi wheel 0.37.0 pyhd8ed1ab_1 conda-forge xlrd 1.2.0 pypi_0 pypi xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (6 by maintainers)

Most upvoted comments

in case you’re still interested in this - whenever that happens I write the adata.obs to a csv, then import that csv and set the index, (the dtypes solve themselves magically), set that as adata.obs and then write h5ad. In my case this happens whenever Im filling any Nans, then pandas gets confused about dtypes, and manually fixing dtypes never works.

adata.obs.to_csv(directory+"temporary.csv")
    metada = pd.read_csv(directory+ "temporary.csv")
    metada.set_index("cellID", inplace = True)
    adata.obs = metada
    adata.write_h5ad(directory+o+".h5ad")

maybe that helps someone whos googling for the same error

@mtvector the problem is this line:

obs = pd.DataFrame([['red',1,0.22222],['blue',0,str(np.nan)],['orange',1,0.1]])

specifically str(np.nan): a float column can’t contain strings, so you need to make sure there’s no mixed string and float columns like that.

I generally strongly discourage using CSV for anything: 'NaN' is not the same as np.nan, and any data format that can’t express the difference will cause problems like this.

Make sure your data is properly stored and read in, otherwise not only anndata will cause problems.

in case you’re still interested in this - whenever that happens I write the adata.obs to a csv, then import that csv and set the index, (the dtypes solve themselves magically), set that as adata.obs and then write h5ad. In my case this happens whenever Im filling any Nans, then pandas gets confused about dtypes, and manually fixing dtypes never works.

adata.obs.to_csv(directory+"temporary.csv")
    metada = pd.read_csv(directory+ "temporary.csv")
    metada.set_index("cellID", inplace = True)
    adata.obs = metada
    adata.write_h5ad(directory+o+".h5ad")

maybe that helps someone whos googling for the same error

as someone who’s googling for the same error, your solution worked great!

We don’t have support for non-string column names of dataframes, since each column is saved individually on disk, and you can’t have non-string hdf5 keys or filenames. This is a similar restriction to parquet files.

Is there a use case other than a toy example where you would need these?

See also #498, #31

Transfered to the anndata repo.

I believe the issue you’re running into is that we require the columns of dataframes to have string identifiers, since these will be the keys in the hdf5 file.

This should let you write:

adata.obs.columns = adata.obs.columns.astype(str)
adata.var.columns = adata.var.columns.astype(str)