xarray: Decoding netCDF is giving incorrect values for a large file
What happened:
0 value is decoded as 2
What you expected to happen:
Data encoded to -32766 should translate to 0
Minimal Complete Verifiable Example:
The first example is the base file I’ve been using, which is a 9GB packed netCDF. The first 12 values in this lookup should be 0 but are getting decoded as 2.
$ xarray.open_dataset("BIG_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
2.565200e+04, 3.547440e+05, 1.091760e+06, 2.170378e+06,
3.482364e+06, 4.704884e+06, 5.689655e+06, 6.297786e+06,
6.534908e+06, 6.543667e+06, 6.543667e+06], dtype=float32)
This second example shows that if the file is decoded without automatic mask_and_scale
, the value is decoded as 0 when applying the scale factor and add offset to an example value in the interpreter.
$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
latitude=44.8, longitude=287.1, method="nearest").values
> array([-32766, -32766, -32766, -32766, -32766, -32766, -32766, -32766,
-32766, -32766, -32766, -32766, -32725, -32199, -31021, -29297,
-27200, -25246, -23672, -22700, -22321, -22307, -22307], dtype=int16)
$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
latitude=44.8, longitude=287.1, method="nearest").values[0] * \
xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["scale_factor"] + xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["add_offset"]
> 0.0
When the netCDF is unpacked using the nco
command line tool, the correct values are unpacked.
$ xarray.open_dataset("BIG_FILE_unpacked.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([ 0. , 0. , 0. ,
0. , 0. , 0. ,
0. , 0. , 0. ,
0. , 0. , 0. ,
25651.61906215, 354743.1221522 , 1091757.933255 ,
2170377.23235622, 3482363.69999847, 4704882.32554591,
5689654.23783437, 6297785.304381 , 6534906.36839455,
6543665.4578304 , 6543665.4578304 ])
Something else that may be relevant is that another file with this same packed data but as much smaller subset (1.7KB) of the big file is unpacked correctly.
$ xarray.open_dataset("SMALL_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 25545.75, 354397.5 , 1091577. ,
2170077. , 3482645.8 , 4704689. , 5689927. , 6297856.5 ,
6535169. , 6543583. , 6543583. ], dtype=float32)
For this to be a real verifiable example, I can transfer the 9GB file to someone or give instructions on how to download it from the climate API I’m getting it from! I’m not sure if this is an issue with xarray or the API or something I’m doing wrong. I’ve mostly been using an older version of xarray, but I also tested on the most recent version available on PIP:
Output of <tt>xr.show_versions()</tt>
INSTALLED VERSIONS
commit: None python: 3.9.6 (default, Jul 8 2021, 20:44:16) [GCC 5.4.0 20160609] python-bits: 64 OS: Linux OS-release: 4.4.0-200-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.12.0 libnetcdf: 4.7.4
xarray: 0.18.2 pandas: 1.3.0 numpy: 1.21.0 scipy: None netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 51.3.3 pip: 20.3.3 conda: None pytest: None IPython: 7.25.0 sphinx: None
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (7 by maintainers)
@ohsqueezy You might also try
engine="h5netcdf
(h5py
/h5netcdf
packages needed). And would it be possible create a small subset of that file via netCDF4 to share?