cudf: min()/max() on empty series returns a large/small value instead of nan

Reporting a bug

  • I am using the latest version of PyGDF from conda or built from master.
  • I have included the following environment details: Linux Distro: Ubuntu 16.04, Linux Kernel: Linux 4.15.0-29-generic x86_64 , GPU Model: TITAN V
  • I have included the following version information for: Arrow: 0.7.1, CUDA: 9.2.148, Numpy: 1.14.5, Pandas: 0.23.3, Python: 3.6.6
  • I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).

Calling min() or max() on an empty series returns largest/smallest value for the dtype of the series respectively.

import pygdf
from pygdf import Series

gseries = Series([])
print(gseries.min())
print(gseries.max())

Output:

1.7976931348623157e+308
-1.7976931348623157e+308

Pandas returns nan for similar cases.


Workaround: Check if the series is empty before calling min()/max()

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 23 (13 by maintainers)

Most upvoted comments

The general aim of cudf is provide a pandas-like API for GPU accelerated workflow. I think this gives a general guideline for making decisions that we were facing in this issue.

However, as seen above, the history of pandas behavior may change and this will influences also cudf code. Currently, cudf supports only pandas version 0.20.3 which brings up an interesting problem: should we implement the pandas mistakes made in past, or should we skips these and aim at supporting the recent versions of pandas behavior?

We will be bumping support to the latest Pandas version in #668, so lets aim for newest behavior.