xarray: Decoding netCDF is giving incorrect values for a large file

What happened:

0 value is decoded as 2

What you expected to happen:

Data encoded to -32766 should translate to 0

Minimal Complete Verifiable Example:

The first example is the base file I’ve been using, which is a 9GB packed netCDF. The first 12 values in this lookup should be 0 but are getting decoded as 2.

$ xarray.open_dataset("BIG_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.565200e+04, 3.547440e+05, 1.091760e+06, 2.170378e+06,
         3.482364e+06, 4.704884e+06, 5.689655e+06, 6.297786e+06,
         6.534908e+06, 6.543667e+06, 6.543667e+06], dtype=float32)

This second example shows that if the file is decoded without automatic mask_and_scale, the value is decoded as 0 when applying the scale factor and add offset to an example value in the interpreter.

$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
                      latitude=44.8, longitude=287.1, method="nearest").values
> array([-32766, -32766, -32766, -32766, -32766, -32766, -32766, -32766,
         -32766, -32766, -32766, -32766, -32725, -32199, -31021, -29297,
         -27200, -25246, -23672, -22700, -22321, -22307, -22307], dtype=int16)

$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
                      latitude=44.8, longitude=287.1, method="nearest").values[0] * \
  xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["scale_factor"] + xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["add_offset"]
> 0.0

When the netCDF is unpacked using the nco command line tool, the correct values are unpacked.

$ xarray.open_dataset("BIG_FILE_unpacked.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([      0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
           25651.61906215,  354743.1221522 , 1091757.933255  ,
         2170377.23235622, 3482363.69999847, 4704882.32554591,
         5689654.23783437, 6297785.304381  , 6534906.36839455,
         6543665.4578304 , 6543665.4578304 ])

Something else that may be relevant is that another file with this same packed data but as much smaller subset (1.7KB) of the big file is unpacked correctly.

$ xarray.open_dataset("SMALL_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([      0.  ,       0.  ,       0.  ,       0.  ,       0.  ,
               0.  ,       0.  ,       0.  ,       0.  ,       0.  ,
               0.  ,       0.  ,   25545.75,  354397.5 , 1091577.  ,
         2170077.  , 3482645.8 , 4704689.  , 5689927.  , 6297856.5 ,
         6535169.  , 6543583.  , 6543583.  ], dtype=float32)

For this to be a real verifiable example, I can transfer the 9GB file to someone or give instructions on how to download it from the climate API I’m getting it from! I’m not sure if this is an issue with xarray or the API or something I’m doing wrong. I’ve mostly been using an older version of xarray, but I also tested on the most recent version available on PIP:

Output of <tt>xr.show_versions()</tt>

INSTALLED VERSIONS

commit: None python: 3.9.6 (default, Jul 8 2021, 20:44:16) [GCC 5.4.0 20160609] python-bits: 64 OS: Linux OS-release: 4.4.0-200-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.18.2 pandas: 1.3.0 numpy: 1.21.0 scipy: None netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 51.3.3 pip: 20.3.3 conda: None pytest: None IPython: 7.25.0 sphinx: None

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

@ohsqueezy You might also try engine="h5netcdf (h5py/h5netcdf packages needed). And would it be possible create a small subset of that file via netCDF4 to share?