xarray: Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file.

What is your issue?

I hit this issue while using rioxarray with a series of operations similar to those noted in this issue https://github.com/corteva/rioxarray/issues/614. After looking through the rioxarray codebase a bit I was able to reproduce the issue with pure xarray operations.

If the Dataset is opened with the default lock=True settings, transposing a DataArray’s coordinates and then copying the DataArray results in a cannot pickle '_thread.lock' object exception.

If the Dataset is opened with lock=False, no error is thrown.

This sample notebook reproduces the error.

This might be user error on my part, but it would be great to have some clarification on why lock=False is necessary here as my understanding was that this should only be necessary when using parallel write operations.

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 22 (14 by maintainers)

Most upvoted comments

TL;DR:

The current default of xr.open_dataset (netcdf4/h5netcdf) uses lazy loading which uses threading.Lock as default locking mechanism if dask is not available. The object cannot be pickled and after some computations (here .transpose) also not (deep)-copied. The only way around is to either explicitly use lock=False when opening files or do a .load() or .compute() before pickle/copy.

Inspection:

Using the MCVE given here https://github.com/pydata/xarray/issues/8442#issuecomment-1841971760 I checked the types of the underlying array and how this works for transposing or not:

  • cache=True in open_dataset (default)

    • no transpose
      • before copy: <class 'xarray.core.indexing.MemoryCachedArray'>
      • after copy: <class 'xarray.core.indexing.MemoryCachedArray'>
      • trying to pickle raises TypeError: cannot pickle '_thread.lock' object in pickle
    • with transpose
      • before transpose: <class 'xarray.core.indexing.MemoryCachedArray'>
      • after transpose: <class 'xarray.core.indexing.LazilyVectorizedIndexedArray'>
      • trying to copy raises: TypeError: cannot pickle '_thread.lock' object in deepcopy
  • cache=False in open_dataset

    • no transpose
      • before copy: <class 'xarray.core.indexing.CopyOnWriteArray'>
      • after copy: <class 'xarray.core.indexing.CopyOnWriteArray'>
      • trying to pickle raises TypeError: cannot pickle '_thread.lock' object in pickle
    • with transpose
      • before transpose: <class 'xarray.core.indexing.CopyOnWriteArray'>
      • after transpose: <class 'xarray.core.indexing.LazilyVectorizedIndexedArray'>
      • trying to copy raises: TypeError: cannot pickle '_thread.lock' object in deepcopy

Reading with netcdf4 and h5netcdf backends the data is wrapped in xarray’s lazy classes See https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf:

Data is always loaded lazily from netCDF files. You can manipulate, slice and subset Dataset and DataArray objects, and no array values are loaded into memory until you try to perform some sort of actual computation.

and further:

Xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, it is often a good idea to load a Dataset (or DataArray) entirely into memory by invoking the Dataset.load() method.

There is also a mention for Pickle:

https://docs.xarray.dev/en/stable/user-guide/io.html#pickle

When pickling an object opened from a NetCDF file, the pickle file will contain a reference to the file on disk. If you want to store the actual array values, load it into memory first with Dataset.load() or Dataset.compute().

What to do?

The pickle issue might not be the big problem as the user is advised to load/compute before. But the copy-issue should be resolved somehow. Unfortunately I do not have an immediate solution to this. @pydata/xarray any ideas?

I believe the issue are these two default locks for HDF5 and NetCDFC: https://github.com/pydata/xarray/blob/2971994ef1dd67f44fe59e846c62b47e1e5b240b/xarray/backends/locks.py#L18

Probably the easiest way to handle this is to fork the code for SerializableLock from dask. It isn’t very complicated: https://github.com/dask/dask/blob/6f2100847e2042d459534294531e8884bef13a99/dask/utils.py#L1160

(brief message to say thanks a lot @kmuehlbauer for the excellent summary)

@kmuehlbauer for me I don’t have the environment anymore, but I suspect I probably had dask installed in it and that’s why it was working.

@kmuehlbauer I experienced the error on Windows as well as WSL.

I tried a fresh env on Linux and still got the error 🤷

Versions
mamba create -n test-lock python=3.11 xarray pooch netcdf4 h5netcdf joblib
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.27.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Edit: From above OP also didn’t have Dask. Adding dask-core to my env, no more error.

OK, here we go, I’ve taken dask out of the loop in a fresh env and can now reproduce both MCVE.

Versions
INSTALLED VERSIONS
------------------
commit: None
python: 3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:43:22) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150500.55.19-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: None

No idea if it has the same underlying cause (I’m not transposing but am copying), but I do have a situation that used to work but now[^1] gives this same cannot pickle '_thread.lock' object error[^3]. I’ll have to see if I can make it into a minimal example. Tried downgrading some things in my environment to no avail.

Edit: here’s a little example[^2] experimenting with joblib.dump to see when the error is raised.

import xarray as xr
from joblib import dump

ds = xr.tutorial.load_dataset("air_temperature").isel(time=slice(4))
ds.to_netcdf("ds.nc", engine="netcdf4")
dump(ds, "ds.joblib")  # 0. Succeeds
ds.close()

# 1. Try to pickle the whole Dataset
ds = xr.open_dataset("ds.nc")
dump(ds, "ds.joblib")  # TypeError: cannot pickle '_thread.lock' object

# 2. Try to pickle a DataArray
ds = xr.open_dataset("ds.nc")
dump(ds.air, "ds.air.joblib")  # TypeError: cannot pickle '_thread.lock' object

# 3. Somehow adding a new variable makes it okay to pickle `ds.air` (and `ds` if `.copy()` applied)
ds = xr.open_dataset("ds.nc")
ds["b"] = xr.zeros_like(ds.air)
dump(ds.air, "ds.air.joblib")  # Succeeds
dump(ds, "ds.joblib")  # But this still fails
dump(ds.copy(), "ds.joblib")  # Succeeds
Versions
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 1.5.3
numpy: 1.26.2
scipy: 1.11.4
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.3
cartopy: 0.22.0
seaborn: 0.11.0
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: 1.7.1
IPython: 8.18.1
sphinx: 5.3.0

Also tried in an env with HDF5 1.14.3, it didn’t help.

[^1]: First noticed a month or two ago I think. [^2]: Not super related to my real case except that my case involves joblib. [^3]: Based on what happened later on this thread, maybe in my old env where it was working I had Dask available, for its SerializableLock, unlike in this new env where I was getting the error.