cudf: [BUG] Fixed-point types with precision < 10 (for Spark) cannot be successfully read back from parquet

Describe the bug While writing parquet files with fixed-point types. Fixed-point types with precision < 10 (spark uses precision) cannot be successfully read back after being written to parquet. I think the columns are being written as ints as Spark tries to read them back with readLong when a readDecimal was expected. This is just a hunch and the real problem might be completely different.

Steps/Code to reproduce bug Create a parquet file with Decimals using precisions < 10

Expected behavior Reading back the parquet file with a 3rd party reader (I used Spark) should work without any problem

Additional context Please see the attached parquet file ROUND_TRIP.tar.gz

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 26 (17 by maintainers)

Most upvoted comments

Closing for now because Spark-RAPIDS has several good workarounds.

GregoryKimball on Feb 13, 2023

The option to convert to 32bit if precision < 10?

The option to specify physical type for any input type. And a toggle in input schema to control that.

devavret on Mar 17, 2021

+1 for keeping the current behavior + a warning. Mainly because we expose the type size to the users and keeping the input type seems like the least surprising behavior.

vuule on Mar 17, 2021

FYI. the spark PR is now merged

razajafri on Feb 23, 2021