cudf: [BUG] Fixed-point types with precision < 10 (for Spark) cannot be successfully read back from parquet

Describe the bug While writing parquet files with fixed-point types. Fixed-point types with precision < 10 (spark uses precision) cannot be successfully read back after being written to parquet. I think the columns are being written as ints as Spark tries to read them back with readLong when a readDecimal was expected. This is just a hunch and the real problem might be completely different.

Steps/Code to reproduce bug Create a parquet file with Decimals using precisions < 10

Expected behavior Reading back the parquet file with a 3rd party reader (I used Spark) should work without any problem

Additional context Please see the attached parquet file ROUND_TRIP.tar.gz

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (17 by maintainers)

Most upvoted comments

Closing for now because Spark-RAPIDS has several good workarounds.

The option to convert to 32bit if precision < 10?

The option to specify physical type for any input type. And a toggle in input schema to control that.

+1 for keeping the current behavior + a warning. Mainly because we expose the type size to the users and keeping the input type seems like the least surprising behavior.

FYI. the spark PR is now merged