anndata: [bug] Support integers as column names

Hi, I recently stumbled about the following problem:

In[98]: adata
Out[98]: 
AnnData object with n_obs × n_vars = 20728 × 32738 
    obs: 0, 'batch', 'condition', 'source'
    var: 0, 1
In[99]: adata.write(file)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-99-c21b8d69b5d6>", line 1, in <module>
    adata.write(file)
  File "/usr/lib/python3.6/site-packages/anndata/base.py", line 1779, in write
    _write_h5ad(filename, self, compression=compression, compression_opts=compression_opts)
  File "/usr/lib/python3.6/site-packages/anndata/readwrite/write.py", line 94, in _write_h5ad
    d = adata._to_dict_fixed_width_arrays()
  File "/usr/lib/python3.6/site-packages/anndata/base.py", line 1926, in _to_dict_fixed_width_arrays
    obs_rec, uns_obs = df_to_records_fixed_width(self._obs)
  File "/usr/lib/python3.6/site-packages/anndata/base.py", line 176, in df_to_records_fixed_width
    uns[k + '_categories'] = c.cat.categories.values
TypeError: unsupported operand type(s) for +: 'int' and 'str'

As it seems, df_to_records_fixed_width has problems when some column names are actually integers.

The following solves this problem:

adata.var.columns = adata.var.columns.astype(str)
adata.obs.columns = adata.obs.columns.astype(str)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

It is a bug! 🙂 Thanks for pointing it out!