anndata: DataFrameView in pandas 2.1.2 changed behavior, breaking AnnData in several ways
Please make sure these conditions are met
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of anndata.
- (optional) I have confirmed this bug exists on the master branch of anndata.
Report
I am getting an infinite recursion due most likely to this pandas bug: https://github.com/pandas-dev/pandas/issues/52927. The bug, which appeared a few days ago (probably the latest pandas release), is triggered by a code that I use to populate old categories that gets dropped after data subsetting (to have a workaround on this: https://github.com/scverse/anndata/issues/890).
The latest main from pandas
is supposed to fix the problem, but it looks like it doesn’t. Maybe I should report the bug to pandas, I will try to reproduce it via pandas
code only now.
Code:
##
from anndata import AnnData
import pandas as pd
def _inplace_fix_subset_categorical_obs(subset_adata: AnnData, original_adata: AnnData) -> None:
"""
Fix categorical obs columns of subset_adata to match the categories of original_adata.
Parameters
----------
subset_adata
The subset AnnData object
original_adata
The original AnnData object
Notes
-----
See discussion here: https://github.com/scverse/anndata/issues/997
"""
obs = subset_adata.obs
for column in obs.columns:
is_categorical = pd.api.types.is_categorical_dtype(obs[column])
if is_categorical:
c = obs[column].cat.set_categories(original_adata.obs[column].cat.categories)
obs[column] = c
anndata = AnnData(obs=pd.DataFrame({"a": pd.Categorical(["a", "b", "c"])}))
subset = anndata[anndata.obs["a"] == "a"]
assert subset.obs['a'].cat.categories.tolist() == ["a"]
_inplace_fix_subset_categorical_obs(subset, anndata)
Traceback:
_inplace_fix_subset_categorical_obs(subset, anndata)
<ipython-input-2-91050d1e96e1>:21: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, pd.CategoricalDtype) instead
is_categorical = pd.api.types.is_categorical_dtype(obs[column])
<ipython-input-2-91050d1e96e1>:24: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
obs[column] = c
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
...
...
...
...
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py:79: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
container[idx] = value
Traceback (most recent call last):
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-191c4157c214>", line 1, in <module>
_inplace_fix_subset_categorical_obs(subset, anndata)
File "<ipython-input-2-91050d1e96e1>", line 24, in _inplace_fix_subset_categorical_obs
obs[column] = c
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
container[idx] = value
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
container[idx] = value
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 79, in __setitem__
container[idx] = value
[Previous line repeated 2961 more times]
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 78, in __setitem__
with view_update(*self._view_args) as container:
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/views.py", line 52, in view_update
new = adata_view.copy()
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 1586, in copy
return self._mutated_copy()
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 1527, in _mutated_copy
return AnnData(**new)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 362, in __init__
self._init_as_actual(
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 548, in _init_as_actual
self._var = _gen_dataframe(
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/functools.py", line 889, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/_core/anndata.py", line 186, in _gen_dataframe_df
anno.columns = anno.columns.astype(str)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 1073, in astype
result = Index(new_values, name=self.name, dtype=new_values.dtype, copy=False)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 482, in __new__
name = maybe_extract_name(name, data, cls)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7579, in maybe_extract_name
if name is None and isinstance(obj, (Index, ABCSeries)):
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/dtypes/generic.py", line 44, in _instancecheck
return _check(inst) and not isinstance(inst, type)
File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/pandas/core/dtypes/generic.py", line 38, in _check
return getattr(inst, attr, "_typ") in comp
RecursionError: maximum recursion depth exceeded while calling a Python object
Versions
-----
anndata 0.10.2
session_info 1.0.0
-----
asciitree NA
cloudpickle 3.0.0
cython_runtime NA
cytoolz 0.12.2
dask 2023.10.1
dateutil 2.8.2
exceptiongroup 1.1.3
fasteners 0.17.3
gmpy2 2.1.2
h5py 3.9.0
importlib_metadata NA
jinja2 3.1.2
markupsafe 2.1.3
mpmath 1.3.0
msgpack 1.0.6
natsort 8.4.0
numcodecs 0.12.1
numpy 1.24.4
packaging 23.2
pandas 2.2.0.dev0+471.g984d75543
psutil 5.9.5
pyarrow 13.0.0
pytz 2023.3.post1
scipy 1.11.3
six 1.16.0
sphinxcontrib NA
sympy 1.12
tblib 2.0.0
tlz 0.12.2
toolz 0.12.0
torch 2.1.0
torchgen NA
tqdm 4.66.1
typing_extensions NA
yaml 6.0.1
zarr 2.16.1
zipp NA
zoneinfo NA
-----
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:09:17) [Clang 16.0.6 ]
macOS-13.4.1-arm64-arm-64bit
-----
Session information updated at 2023-10-28 17:20
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 17 (16 by maintainers)
Merged!
I think I’ve found the problem:
in pandas 2.1.2:
in pandas 2.1.1:
It looks like
pd.DataFrame._constructor_from_mgr
is what changed.This is a behavior change in a bug fix release of pandas, so possibly is a new pandas bug in and of itself.
It’s unclear to me whether https://github.com/pandas-dev/pandas/issues/52927 is relevant to this bug in anndata