astropy: `MaskedQuantity` breaking `np.nanmean` etc. methods in `aggregate_downsample`

Description

A downstream issue lightkurve/lightkurve#1157 appears to have been triggered by the use of Masked objects, probably following #11127. The same code that returned a timeseries subclass with Quantity columns, “masked” with np.nan where applicable, with astropy 5.0rc1 is creating columns of MaskedQuantity. This is breaking any code using functions like np.nanmean that are currently not supported.

Expected behaviour

MaskedQuantity should either fully support required array functions or not be used where still unsupported functions are required.

Actual behaviour

For a more minimal example that can also be constructed in 4.3 (i.e. the aggregate_downsample failure is independently from its updates in 5.0) – note that the array functions do work for np.ma.array, but the NaNs don’t seem to be correctly caught in the downsampling:

Steps to Reproduce

import numpy as np
from astropy import units as u
from astropy.time import Time
from astropy.timeseries import Timeseries, aggregate_downsample
from astropy.utils.masked import Masked

ma = np.ma.array([[1, 2, 13, 248, np.nan, np.nan, 303, 12, 21]], mask=np.array([0, 0, 0, 1, 0, 1, 1, 0, 0])).T
np.nanmean(ma)
9.8

np.nanmean(ma.data)
85.71428571428571

mq = Masked(ma) * u.m
np.nanmean(qa)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedQuantity'>]

tsa = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=ma)
tsq = TimeSeries(time=Time(np.arange(9)+4e4, format='mjd'), data=mq)
aggregate_downsample(tsa, time_bin_size=3*u.d)
<BinnedTimeSeries length=3>
time_bin_start time_bin_size        col0      
                     s                        
    object        float64         float64     
-------------- ------------- -----------------
       40000.0      259200.0 5.333333333333333
40002.99999991      259200.0               nan
40005.99999982      259200.0              16.5

aggregate_downsample(tsq, time_bin_size=3*u.d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 127, in aggregate_downsample
    data[unique_indices] = u.Quantity(reduceat(values.value, groups, aggregate_func),
  File "/opt/lib/python3.10/site-packages/astropy/timeseries/downsample.py", line 28, in reduceat
    result.append(function(array[indices[i]:indices[i+1]]))
  File "<__array_function__ internals>", line 5, in nanmean
TypeError: no implementation found for 'numpy.nanmean' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]

I am not sure how exactly the timeseries subclass in the downstream issue is instantiated, but the change to MaskedQuantity occurred with 5.0rc1 without any other code changes (@barentsen may have more background on the read-in details). So the code should probably be more conservative in using Masked objects for now, or functions like aggregate_downsample will need to escape those methods applying them either to the unmasked instances or their numpy content.

np.nanmean(np.ma.array(mq.value.data, mask=mq.mask))
9.8

Better still would of course be to directly support those functions. A naïve attempt to just add them to function_helpers.MASKED_SAFE_FUNCTIONS actually (sort of) worked for np.nanmedian, but np.nanmean, np.nansum, np.nanprod` then fail with

File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 937, in nanmean
    arr, mask = _replace_nan(a, 0)
  File "/opt/lib/python3.10/site-packages/numpy/lib/nanfunctions.py", line 108, in _replace_nan
    np.copyto(a, val, where=mask)
  File "<__array_function__ internals>", line 5, in copyto
TypeError: no implementation found for 'numpy.copyto' on types that implement __array_function__: [<class 'astropy.utils.masked.core.MaskedNDArray'>]

so they will probably need a dedicated implementation – analogously to MaskedIterator.mean?

System Details

Python: 3.8.12 / 3.9.8 / 3.10.0 Platform: macOS-10.14.6-x86_64-i386-64bit Astropy: 5.0rc1 Numpy: 1.21.2 Scipy: 1.7.1 Matplotlib: 3.5.0rc1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (18 by maintainers)

Most upvoted comments

I’ll try to push my fix for the nanfunctions later today - it was mostly done…

Also, I am now seeing that np.nansum never returns NaN, not even for all-NaN arrays, so I guess it should not return Masked arrays either. Must admit I find this somewhat inconsistent…

I understand that as sum([]) just as with all-masked arrays, so returning 0.0 looks logical to me. Likewise for nanmean, which would then turn into 0.0 / len([]) = np.nan.