cudf: [FEA] Improve exception message when unknown Parquet page encoding detected
Is your feature request related to a problem? Please describe. A user of the RAPIDS Accelerator for Apache Spark reported the following exception:
ai.rapids.cudf.CudfException: CUDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-424-cuda11/thirdparty/cudf/cpp/src/io/parquet/reader_impl_preprocess.cu:346: Unsupported page encoding detected
at ai.rapids.cudf.ParquetChunkedReader.readChunk(Native Method)
at ai.rapids.cudf.ParquetChunkedReader.readChunk(ParquetChunkedReader.java:111)
at com.nvidia.spark.rapids.ParquetTableReader.$anonfun$next$1(GpuParquetScan.scala:2639)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.ParquetTableReader.next(GpuParquetScan.scala:2638)
at com.nvidia.spark.rapids.ParquetTableReader.next(GpuParquetScan.scala:2615)
at com.nvidia.spark.rapids.CachedGpuBatchIterator$.$anonfun$apply$1(GpuDataProducer.scala:161)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.CachedGpuBatchIterator$.apply(GpuDataProducer.scala:158
The exception message from libcudf is not very helpful in that it says an unsupported page encoding was detected but not what that unexpected page encoding was (i.e.: the enum value). Without this information, we’re left guessing what encoding was found in the file and usually have to request users to share a sample file to find out. Not all users are willing to share sample files.
Describe the solution you’d like Exception messages for an unexpected/unsupported value should show the value as part of the exception message.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 16 (16 by maintainers)
Ah I see. I was fixated on the old
CUDF_EXPECTSandstd::allcode (which is no longer there) so I skimmed through the new code. But now i see that there is still work to be done. I apologize, I end up having to do a lot of context switching so I missed this.Thanks
Thank you so much! I assumed that the good first issue label was for easier work. In that case, I’ll do https://github.com/rapidsai/cudf/issues/14661 and depending on how that goes, hopefully I can pick a few more from the backlog. Thanks again 😃
I was talking about the
std:all_ofcall 😉 That’s my old detection code from #12754.This has not yet been fully addressed. Right now the error code for unsupported option is set during header parsing, and available on the host side in decode_page_headers. All that needs to be done is to test the returned error code for the correct bit, then do a transform_reduce on the page encoding field (after turning the encoding into a mask), and then check the set bits to find the offending encoding(s). Soon the only unsupported encoding will be byte_stream_split, so this hasn’t been a high priority for me. Good first issue.