xarray: open_mfdataset parallel=True failing with netcdf4 >= 1.6.1

What happened?

When using the parallel=True key, open_mfdataset fails with NetCDF: Unknown file format. Running the same command again (with try+except), or with parallel=False executes as expected.

works:

xr.open_mfdataset(dirpath +'\\*.nc', parallel=False)

works:

try:
   xr.open_mfdataset(dirpath +'\\*.nc', parallel=True)
except:
   xr.open_mfdataset(dirpath +'\\*.nc', parallel=True)

fails:

xr.open_mfdataset(dirpath +'\\*.nc', parallel=True)

[Errno -51] NetCDF: Unknown file format

all with engine='netcdf4' any help is highly appreciated as I’m a bit lost how to investigate this further.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 23 (14 by maintainers)

Commits related to this issue

Most upvoted comments

@jhamman Sorry for my delay — I started this the other day and got waylaid. I’ll try to get back to it today or tomorrow.

My workflow is my own laptop only

Use LocalCluster! 😉

The right fix is to disable threads, like in my example above

This fix will restrict you to serial compute.

You can also parallelize across processes using something like

PBSCluster(
	...,
	cores=1,
	processes=2,
)

or LocalCluster(threads_per_worker=1, ...)