xarray: bottleneck : Wrong mean for float32 array
I think it is better to have this discussion here instead of on the dask
page https://github.com/dask/dask/issues/2095
This is the replicable “bug”:
ds = xarray.open_dataset('/opt/data/ERAIN/ERAIN-t2m-1983-2012.seasmean.nc')
ds.var167.mean()
Out[14]:
<xarray.DataArray 'var167' ()>
array(261.6441345214844)
ds.var167.data.mean()
Out[15]: 278.62466
The dataset is ~65 MB, here the file https://www.dropbox.com/s/xtj3fm7ihtbwd5r/ERAIN-t2m-1983-2012.seasmean.nc?dl=0 It is a quite normal NetCDF (no NaN), just processed with CDO as you can see on the dask issue.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 19 (15 by maintainers)
I would rather pick option (1) above, that is, “Stop using bottleneck on float32 arrays”
On second thought we should add this to a FAQ page.
The difference is that Bottleneck does the sum in the naive way, whereas NumPy uses the more numerically stable pairwise summation.