xarray: New deep copy behavior in 2022.9.0 causes maximum recursion error

What happened?

I have a case where a Dataset to be written to a NetCDF file has “ancillary_variables” that have a circular dependence. For example, variable A has .attrs["ancillary_variables"] that contains variable B, and B has .attrs["ancillary_variables"] that contains A.

What did you expect to happen?

Circular dependencies are detected and avoided. No maximum recursion error.

Minimal Complete Verifiable Example

In [1]: import xarray as xr

In [2]: a = xr.DataArray(1.0, attrs={})

In [3]: b = xr.DataArray(2.0, attrs={})

In [4]: a.attrs["other"] = b

In [5]: b.attrs["other"] = a

In [6]: a_copy = a.copy(deep=True)
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
Cell In [6], line 1
----> 1 a_copy = a.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106 
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106 
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

    [... skipping similar frames: DataArray.copy at line 1172 (495 times), DataArray.__deepcopy__ at line 1190 (494 times), _deepcopy_dict at line 231 (494 times), Variable.copy at line 996 (494 times), deepcopy at line 146 (494 times), deepcopy at line 153 (494 times)]

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:985, in Variable.copy(self, deep, data)
    982         ndata = indexing.MemoryCachedArray(ndata.array)
    984     if deep:
--> 985         ndata = copy.deepcopy(ndata)
    987 else:
    988     ndata = as_compatible_data(data)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:137, in deepcopy(x, memo, _nil)
    134 if memo is None:
    135     memo = {}
--> 137 d = id(x)
    138 y = memo.get(d, _nil)
    139 if y is not _nil:

RecursionError: maximum recursion depth exceeded while calling a Python object

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I have at least one other issue related to the new xarray release but I’m still tracking it down. I think it is also related to the deep copy behavior change which was merged a day before the release so our CI didn’t have time to test the “unstable” version of xarray.

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.19.0-76051900-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.1
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: 5.2.3

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (16 by maintainers)

Most upvoted comments

It looks like that PR fixes all of my Satpy unit tests. I’m not sure how that is possible if it doesn’t also change when dask arrays are copied.

Is there some feature that python uses to check whether a data structure is recursive when it’s copying, which we’re not taking advantage of? I can look more later.

yes, def __deepcopy__(self, memo=None) has the memo argument exactly for the purpose of dealing with recursion, see https://docs.python.org/3/library/copy.html. Currently, xarray’s __deepcopy__ methods do not pass on the memo argument when deepcopying its components.