arrow: [Python] Inconsistent cast behavior between array and scalar for int64

Describe the bug, including details regarding any error messages, version, and platform.

>>> scal = pa.scalar(6312878760374611856, type=pa.int64())
>>> scal.cast(pa.float64())
<pyarrow.DoubleScalar: 6.312878760374612e+18>
>>> arr = pa.array([6312878760374611856], type=pa.int64())
>>> arr.cast(pa.float64())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow\array.pxi", line 926, in pyarrow.lib.Array.cast
  File "C:\pandas2_ps_04323\lib\site-packages\pyarrow\compute.py", line 391, in cast
    return call_function("cast", [arr], options)
  File "pyarrow\_compute.pyx", line 560, in pyarrow._compute.call_function
  File "pyarrow\_compute.pyx", line 355, in pyarrow._compute.Function.call
  File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 6312878760374611856 not in range: -9007199254740992 to 9007199254740992
>>>

Behavior is not consistent in casting between array and scalar. The array behavior of raising does not seem correct, as it seems an int64 should always be able to be casted to float64.

Component(s)

Python

About this issue

Original URL
State: closed
Created a year ago
Comments: 16 (5 by maintainers)

Commits related to this issue

Add tests for GH-35040 and GH-34901 — committed to danepitkin/arrow by danepitkin a year ago
Add tests for GH-35040 and GH-34901 — committed to danepitkin/arrow by danepitkin a year ago
GH-35040: [Python] Pyarrow scalar cast should use compute kernel (#35395) ### Rationale for this change Scalar cast should use the computer kernel just like Arrays, instead of its own custom impleme... — committed to apache/arrow by danepitkin a year ago
GH-35040: [Python] Pyarrow scalar cast should use compute kernel (#35395) ### Rationale for this change Scalar cast should use the computer kernel just like Arrays, instead of its own custom impleme... — committed to Bit-Quill/arrow by danepitkin a year ago
GH-35040: [Python] Pyarrow scalar cast should use compute kernel (#35395) ### Rationale for this change Scalar cast should use the computer kernel just like Arrays, instead of its own custom impleme... — committed to rtpsw/arrow by danepitkin a year ago

Most upvoted comments

Thanks for raising this issue by the way. I don’t think I expressed that earlier. Your contributions are appreciated!

danepitkin on Apr 6, 2023

I’d recommend filing an issue with numpy about this, too:

>>> import numpy as np
>>> np.__version__
'1.24.2'

# Bug: No safety error for initial int64 -> float64 conversion
>>> np.array([18014398509481983]).astype("float64", casting="safe").astype(str)
array(['1.8014398509481984e+16'], dtype='<U32')

danepitkin on Apr 6, 2023

But in the example where the cast is safe, for 18,014,398,509,481,984, shouldn’t that then succeed in pyarrow if it can be done safely? In my example, the array case is still raising even if the cast is safe. Should it only raise for 18,014,398,509,481,983?

If pyarrow were to follow the floating point specification exactly, then yes it would. Right now, it seems to be a limitation of the implementation. You could argue that option (3) above should be a bug instead of a feature.

danepitkin on Apr 6, 2023

For pyarrow, we should probably:

Allow both safe and unsafe conversion options for scalar APIs (feature)
Default to safe conversion for scalars, which appears is not happening (bug)
Look into allowing safe conversion from int <-> float for valid numbers larger than 2^53 (feature)

@AlenkaF @jorisvandenbossche what do you think?

danepitkin on Apr 6, 2023