cudf: [BUG] Fixed-point types with precision < 10 (for Spark) cannot be successfully read back from parquet
Describe the bug
While writing parquet files with fixed-point types. Fixed-point types with precision < 10 (spark uses precision) cannot be successfully read back after being written to parquet. I think the columns are being written as ints as Spark tries to read them back with readLong when a readDecimal was expected. This is just a hunch and the real problem might be completely different.
Steps/Code to reproduce bug Create a parquet file with Decimals using precisions < 10
Expected behavior Reading back the parquet file with a 3rd party reader (I used Spark) should work without any problem
Additional context Please see the attached parquet file ROUND_TRIP.tar.gz
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 26 (17 by maintainers)
Closing for now because Spark-RAPIDS has several good workarounds.
The option to specify physical type for any input type. And a toggle in input schema to control that.
+1 for keeping the current behavior + a warning. Mainly because we expose the type size to the users and keeping the input type seems like the least surprising behavior.
FYI. the spark PR is now merged