trino: io.trino.spi.TrinoException: Failed reading parquet data: Socket is closed by peer
After upgrading to 361, I’m facing an issue when running a fairly straight forward query:
SELECT * FROM some_table WHERE some_column = 'some_value' LIMIT 10
However, if I remove WHERE clause, it works as expected.
I have verified that parquet file is not corrupted in any way, and is indeed readable. Same query, using the same data source works as expected in v360.
Here’s the full error:
io.trino.spi.TrinoException: Failed reading parquet data; source= s3://<REDACTED>; can not read class org.apache.parquet.format.PageHeader: Socket is closed by peer.
at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:230)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:396)
at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:375)
at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:282)
at io.trino.operator.project.DictionaryAwarePageFilter.filter(DictionaryAwarePageFilter.java:59)
at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:121)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:293)
at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:245)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:151)
at io.trino.operator.Driver.processInternal(Driver.java:387)
at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
at io.trino.operator.Driver.tryWithLock(Driver.java:683)
at io.trino.operator.Driver.processFor(Driver.java:284)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.trino.$gen.Trino_361____20210901_181740_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Socket is closed by peer.
at org.apache.parquet.format.Util.read(Util.java:365)
at org.apache.parquet.format.Util.readPageHeader(Util.java:132)
at org.apache.parquet.format.Util.readPageHeader(Util.java:127)
at io.trino.parquet.reader.ParquetColumnChunk.readPageHeader(ParquetColumnChunk.java:76)
at io.trino.parquet.reader.ParquetColumnChunk.readAllPages(ParquetColumnChunk.java:89)
at io.trino.parquet.reader.ParquetReader.createPageReader(ParquetReader.java:388)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:368)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:444)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:427)
at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:224)
... 39 more
Caused by: io.trino.hive.$internal.parquet.org.apache.thrift.transport.TTransportException: Socket is closed by peer.
at io.trino.hive.$internal.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:130)
at io.trino.hive.$internal.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at io.trino.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:635)
at io.trino.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:541)
at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:155)
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1026)
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1019)
at org.apache.parquet.format.PageHeader.read(PageHeader.java:896)
at org.apache.parquet.format.Util.read(Util.java:362)
... 48 more
SET SESSION hive.parquet_ignore_statistics = true seems to bypass the issue. And the query works as expected on v361.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 20 (10 by maintainers)
Hi, we are experiencing the same issue after migrating from trino
356to362we solved by performing a rollback to version360. Trino failes to read parquet files (with column indexes) generated by a spark job using AWS Glue as metastore.File schema with metadata
Glue table schema