arrow: [Python] Inconsistent cast behavior between array and scalar for int64

Describe the bug, including details regarding any error messages, version, and platform.

>>> scal = pa.scalar(6312878760374611856, type=pa.int64())
>>> scal.cast(pa.float64())
<pyarrow.DoubleScalar: 6.312878760374612e+18>
>>> arr = pa.array([6312878760374611856], type=pa.int64())
>>> arr.cast(pa.float64())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow\array.pxi", line 926, in pyarrow.lib.Array.cast
  File "C:\pandas2_ps_04323\lib\site-packages\pyarrow\compute.py", line 391, in cast
    return call_function("cast", [arr], options)
  File "pyarrow\_compute.pyx", line 560, in pyarrow._compute.call_function
  File "pyarrow\_compute.pyx", line 355, in pyarrow._compute.Function.call
  File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 6312878760374611856 not in range: -9007199254740992 to 9007199254740992
>>>

Behavior is not consistent in casting between array and scalar. The array behavior of raising does not seem correct, as it seems an int64 should always be able to be casted to float64.

Component(s)

Python

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for raising this issue by the way. I don’t think I expressed that earlier. Your contributions are appreciated!

I’d recommend filing an issue with numpy about this, too:

>>> import numpy as np
>>> np.__version__
'1.24.2'

# Bug: No safety error for initial int64 -> float64 conversion
>>> np.array([18014398509481983]).astype("float64", casting="safe").astype(str)
array(['1.8014398509481984e+16'], dtype='<U32')

But in the example where the cast is safe, for 18,014,398,509,481,984, shouldn’t that then succeed in pyarrow if it can be done safely? In my example, the array case is still raising even if the cast is safe. Should it only raise for 18,014,398,509,481,983?

If pyarrow were to follow the floating point specification exactly, then yes it would. Right now, it seems to be a limitation of the implementation. You could argue that option (3) above should be a bug instead of a feature.

For pyarrow, we should probably:

  1. Allow both safe and unsafe conversion options for scalar APIs (feature)
  2. Default to safe conversion for scalars, which appears is not happening (bug)
  3. Look into allowing safe conversion from int <-> float for valid numbers larger than 2^53 (feature)

@AlenkaF @jorisvandenbossche what do you think?