anndata: DataFrameView in pandas 2.1.2 changed behavior, breaking AnnData in several ways

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

I am getting an infinite recursion due most likely to this pandas bug: https://github.com/pandas-dev/pandas/issues/52927. The bug, which appeared a few days ago (probably the latest pandas release), is triggered by a code that I use to populate old categories that gets dropped after data subsetting (to have a workaround on this: https://github.com/scverse/anndata/issues/890).

The latest main from pandas is supposed to fix the problem, but it looks like it doesn’t. Maybe I should report the bug to pandas, I will try to reproduce it via pandas code only now.

Code:

##
from anndata import AnnData
import pandas as pd

def _inplace_fix_subset_categorical_obs(subset_adata: AnnData, original_adata: AnnData) -> None:
    """
    Fix categorical obs columns of subset_adata to match the categories of original_adata.

    Parameters
    ----------
    subset_adata
        The subset AnnData object
    original_adata
        The original AnnData object

    Notes
    -----
    See discussion here: https://github.com/scverse/anndata/issues/997
    """
    obs = subset_adata.obs
    for column in obs.columns:
        is_categorical = pd.api.types.is_categorical_dtype(obs[column])
        if is_categorical:
            c = obs[column].cat.set_categories(original_adata.obs[column].cat.categories)
            obs[column] = c

anndata = AnnData(obs=pd.DataFrame({"a": pd.Categorical(["a", "b", "c"])}))
subset = anndata[anndata.obs["a"] == "a"]
assert subset.obs['a'].cat.categories.tolist() == ["a"]
_inplace_fix_subset_categorical_obs(subset, anndata)

Traceback:

_inplace_fix_subset_categorical_obs(subset, anndata)
<ipython-input-2-91050d1e96e1>:21: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead
  is_categorical = pd.api.types.is_categorical_dtype(obs[column])
<ipython-input-2-91050d1e96e1>:24: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  obs[column] = c
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
...
...
...
...
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  container[idx] = value
Traceback (most recent call last):
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-191c4157c214>", line 1, in <module>
    _inplace_fix_subset_categorical_obs(subset, anndata)
  File "<ipython-input-2-91050d1e96e1>", line 24, in _inplace_fix_subset_categorical_obs
    obs[column] = c
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
    container[idx] = value
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
    container[idx] = value
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
    container[idx] = value
  [Previous line repeated 2961 more times]
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 78, in __setitem__
    with view_update(*self._view_args) as container:
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 52, in view_update
    new = adata_view.copy()
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 1586, in copy
    return self._mutated_copy()
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 1527, in _mutated_copy
    return AnnData(**new)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 362, in __init__
    self._init_as_actual(
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 548, in _init_as_actual
    self._var = _gen_dataframe(
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 186, in _gen_dataframe_df
    anno.columns = anno.columns.astype(str)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 1073, in astype
    result = Index(new_values, name=self.name, dtype=new_values.dtype, copy=False)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 482, in __new__
    name = maybe_extract_name(name, data, cls)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7579, in maybe_extract_name
    if name is None and isinstance(obj, (Index, ABCSeries)):
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/dtypes/generic.py", line 44, in _instancecheck
    return _check(inst) and not isinstance(inst, type)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/dtypes/generic.py", line 38, in _check
    return getattr(inst, attr, "_typ") in comp
RecursionError: maximum recursion depth exceeded while calling a Python object

Versions

-----
anndata             0.10.2
session_info        1.0.0
-----
asciitree           NA
cloudpickle         3.0.0
cython_runtime      NA
cytoolz             0.12.2
dask                2023.10.1
dateutil            2.8.2
exceptiongroup      1.1.3
fasteners           0.17.3
gmpy2               2.1.2
h5py                3.9.0
importlib_metadata  NA
jinja2              3.1.2
markupsafe          2.1.3
mpmath              1.3.0
msgpack             1.0.6
natsort             8.4.0
numcodecs           0.12.1
numpy               1.24.4
packaging           23.2
pandas              2.2.0.dev0+471.g984d75543
psutil              5.9.5
pyarrow             13.0.0
pytz                2023.3.post1
scipy               1.11.3
six                 1.16.0
sphinxcontrib       NA
sympy               1.12
tblib               2.0.0
tlz                 0.12.2
toolz               0.12.0
torch               2.1.0
torchgen            NA
tqdm                4.66.1
typing_extensions   NA
yaml                6.0.1
zarr                2.16.1
zipp                NA
zoneinfo            NA
-----
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:09:17) [Clang 16.0.6 ]
macOS-13.4.1-arm64-arm-64bit
-----
Session information updated at 2023-10-28 17:20

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 17 (16 by maintainers)

Most upvoted comments

Merged!

I think I’ve found the problem:

import anndata as ad, pandas as pd, numpy as np

adata = ad.AnnData(
    obs=pd.DataFrame(
        {"b": [1, 2, 3]},
        index=list("abc")
    )
)
v = adata[[0], :]

type(v.obs.copy())

in pandas 2.1.2:

anndata._core.views.DataFrameView

in pandas 2.1.1:

pandas.core.frame.DataFrame

It looks like pd.DataFrame._constructor_from_mgr is what changed.

This is a behavior change in a bug fix release of pandas, so possibly is a new pandas bug in and of itself.

It’s unclear to me whether https://github.com/pandas-dev/pandas/issues/52927 is relevant to this bug in anndata