dask: Cannot compute min or max of dates in dask array when converted from dask dataframe using to_dask_array

What happened:

Computing the minimum date from a dask array causes an exception to be thrown. That dask array was converted from a dask dataframe using to_dask_array.

What you expected to happen:

I expected to be able to compute the minimum date from the dask array.

Minimal Complete Verifiable Example:

using to_dask_array raises an exception

import pandas as pd
from datetime import date
import dask.dataframe as dd

dates_df = pd.Series(pd.date_range(date(2014,1,1),date(2015,1,1), freq="M"))
dates_dd = dd.from_pandas(dates_df, npartitions=1)

dates_da = dates_dd.to_dask_array()
print(dates_da.min().compute())

UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

Using from_array works as expected

import pandas as pd
from datetime import date
import dask.dataframe as dd

dates_df = pd.Series(pd.date_range(date(2014,1,1),date(2015,1,1), freq="M"))

dates_da = dd.from_array(dates_df.values)
print(dates_da.min().compute())

2014-01-31 00:00:00

Environment:

  • Dask version: 2.16.0
  • Python version: 3.7.6
  • Operating System: macOS Catalina 10.15.5
  • Install method (conda, pip, source): conda

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

fixable with exposing meta to to_dask_array

I think it’s just in Array.min(), .sum(), etc. Anything going through reduction.