trino: Required field 'uncompressed_page_size' was not found in serialized data! Struct:
Hi please refer below error when accessing parquet kindly help . Sample file i have enclosed from haggle download and tested same below error is appearing in presto cli.
I am using 326 version …
io.prestosql.spi.PrestoException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
at io.prestosql.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:167)
at io.prestosql.spi.block.LazyBlock$LazyData.load(LazyBlock.java:378)
at io.prestosql.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:357)
at io.prestosql.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:275)
at io.prestosql.spi.Page.getLoadedPage(Page.java:261)
at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:290)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_326____20191205_193016_2.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
at org.apache.parquet.format.Util.read(Util.java:216)
at org.apache.parquet.format.Util.readPageHeader(Util.java:65)
at io.prestosql.parquet.reader.ParquetColumnChunk.readPageHeader(ParquetColumnChunk.java:57)
at io.prestosql.parquet.reader.ParquetColumnChunk.readAllPages(ParquetColumnChunk.java:67)
at io.prestosql.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:256)
at io.prestosql.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:310)
at io.prestosql.parquet.reader.ParquetReader.readBlock(ParquetReader.java:293)
at io.prestosql.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:161)
... 16 more
Caused by: io.prestosql.hive.$internal.parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1055)
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966)
at org.apache.parquet.format.PageHeader.read(PageHeader.java:843)
at org.apache.parquet.format.Util.read(Util.java:213)
... 23 more
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (7 by maintainers)
Hi hashhar Thanks for the information yes I experienced same kind of problem… underlying data platform jars doesn’t support the Huge table data read, the vendor has fixed the issue no changes on presto part…
@sib19 If you’re still here can you please confirm if the data source you were querying has some ETL job running on it that re-writes files? Or if you were using some kind of caching layer (eg. Rubix?).
I ran into this issue today and in my case this happened (most probably) due to Rubix. After dropping the cached files for that table this error went away. I ran into this with Presto 333.