trino: Required field 'uncompressed_page_size' was not found in serialized data! Struct:

Hi please refer below error when accessing parquet kindly help . Sample file i have enclosed from haggle download and tested same below error is appearing in presto cli.

I am using 326 version …

io.prestosql.spi.PrestoException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
        at io.prestosql.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:167)
        at io.prestosql.spi.block.LazyBlock$LazyData.load(LazyBlock.java:378)
        at io.prestosql.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:357)
        at io.prestosql.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:275)
        at io.prestosql.spi.Page.getLoadedPage(Page.java:261)
        at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:290)
        at io.prestosql.operator.Driver.processInternal(Driver.java:379)
        at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
        at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
        at io.prestosql.operator.Driver.processFor(Driver.java:276)
        at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
        at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
        at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
        at io.prestosql.$gen.Presto_326____20191205_193016_2.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
        at org.apache.parquet.format.Util.read(Util.java:216)
        at org.apache.parquet.format.Util.readPageHeader(Util.java:65)
        at io.prestosql.parquet.reader.ParquetColumnChunk.readPageHeader(ParquetColumnChunk.java:57)
        at io.prestosql.parquet.reader.ParquetColumnChunk.readAllPages(ParquetColumnChunk.java:67)
        at io.prestosql.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:256)
        at io.prestosql.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:310)
        at io.prestosql.parquet.reader.ParquetReader.readBlock(ParquetReader.java:293)
        at io.prestosql.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:161)
        ... 16 more
Caused by: io.prestosql.hive.$internal.parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@33fc99f6
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1055)
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966)
        at org.apache.parquet.format.PageHeader.read(PageHeader.java:843)
        at org.apache.parquet.format.Util.read(Util.java:213)
        ... 23 more

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Hi hashhar Thanks for the information yes I experienced same kind of problem… underlying data platform jars doesn’t support the Huge table data read, the vendor has fixed the issue no changes on presto part…

@sib19 If you’re still here can you please confirm if the data source you were querying has some ETL job running on it that re-writes files? Or if you were using some kind of caching layer (eg. Rubix?).

I ran into this issue today and in my case this happened (most probably) due to Rubix. After dropping the cached files for that table this error went away. I ran into this with Presto 333.