xarray: bottleneck : Wrong mean for float32 array

I think it is better to have this discussion here instead of on the dask page https://github.com/dask/dask/issues/2095

This is the replicable “bug”:

ds = xarray.open_dataset('/opt/data/ERAIN/ERAIN-t2m-1983-2012.seasmean.nc')
ds.var167.mean()
Out[14]: 
<xarray.DataArray 'var167' ()>
array(261.6441345214844)
ds.var167.data.mean()
Out[15]: 278.62466

The dataset is ~65 MB, here the file https://www.dropbox.com/s/xtj3fm7ihtbwd5r/ERAIN-t2m-1983-2012.seasmean.nc?dl=0 It is a quite normal NetCDF (no NaN), just processed with CDO as you can see on the dask issue.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 19 (15 by maintainers)

Most upvoted comments

Would it be worth adding a warning (until the right solution is found) if someone is doing .mean() on a DataArray which is float32?

I would rather pick option (1) above, that is, “Stop using bottleneck on float32 arrays”

On second thought we should add this to a FAQ page.

The difference is that Bottleneck does the sum in the naive way, whereas NumPy uses the more numerically stable pairwise summation.