trino: io.trino.spi.TrinoException: Failed reading parquet data: Socket is closed by peer

After upgrading to 361, I’m facing an issue when running a fairly straight forward query:

SELECT * FROM some_table WHERE some_column = 'some_value' LIMIT 10

However, if I remove WHERE clause, it works as expected.

I have verified that parquet file is not corrupted in any way, and is indeed readable. Same query, using the same data source works as expected in v360.

Here’s the full error:

io.trino.spi.TrinoException: Failed reading parquet data; source= s3://<REDACTED>; can not read class org.apache.parquet.format.PageHeader: Socket is closed by peer.
	at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:230)
	at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:396)
	at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:375)
	at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:282)
	at io.trino.operator.project.DictionaryAwarePageFilter.filter(DictionaryAwarePageFilter.java:59)
	at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:121)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:293)
	at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:245)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:151)
	at io.trino.operator.Driver.processInternal(Driver.java:387)
	at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
	at io.trino.operator.Driver.tryWithLock(Driver.java:683)
	at io.trino.operator.Driver.processFor(Driver.java:284)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.trino.$gen.Trino_361____20210901_181740_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Socket is closed by peer.
	at org.apache.parquet.format.Util.read(Util.java:365)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:132)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:127)
	at io.trino.parquet.reader.ParquetColumnChunk.readPageHeader(ParquetColumnChunk.java:76)
	at io.trino.parquet.reader.ParquetColumnChunk.readAllPages(ParquetColumnChunk.java:89)
	at io.trino.parquet.reader.ParquetReader.createPageReader(ParquetReader.java:388)
	at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:368)
	at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:444)
	at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:427)
	at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:224)
	... 39 more
Caused by: io.trino.hive.$internal.parquet.org.apache.thrift.transport.TTransportException: Socket is closed by peer.
	at io.trino.hive.$internal.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:130)
	at io.trino.hive.$internal.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at io.trino.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:635)
	at io.trino.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:541)
	at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:155)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1026)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1019)
	at org.apache.parquet.format.PageHeader.read(PageHeader.java:896)
	at org.apache.parquet.format.Util.read(Util.java:362)
	... 48 more

SET SESSION hive.parquet_ignore_statistics = true seems to bypass the issue. And the query works as expected on v361.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 20 (10 by maintainers)

Most upvoted comments

Hi, we are experiencing the same issue after migrating from trino 356 to 362 we solved by performing a rollback to version 360. Trino failes to read parquet files (with column indexes) generated by a spark job using AWS Glue as metastore.

File schema with metadata

creator:           parquet-mr version 1.11.1 (build 765bd5cd7fdef2af1cecd0755000694b992bfadd) 
extra:             writer.time.zone = UTC 

file schema:       hive_schema 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
oid:          OPTIONAL BINARY L:STRING R:0 D:1
mar: OPTIONAL INT32 R:0 D:1
ui:                OPTIONAL INT64 R:0 D:1
nog:               OPTIONAL INT32 L:INTEGER(16,true) R:0 D:1
ref:               OPTIONAL BOOLEAN R:0 D:1
din:               OPTIONAL BOOLEAN R:0 D:1
lu:                OPTIONAL BOOLEAN R:0 D:1
brk:               OPTIONAL BOOLEAN R:0 D:1
chind:             OPTIONAL INT32 L:DATE R:0 D:1

row group 1:       RC:711710 TS:34045865 OFFSET:4 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
oid:                BINARY SNAPPY DO:0 FPO:4 SZ:28784309/31316571/1,09 VC:711710 ENC:BIT_PACKED,RLE,PLAIN ST:[min: <REDACTED>, max: <REDACTED>, num_nulls: 0]
mar:                INT32 SNAPPY DO:0 FPO:28784313 SZ:358774/358582/1,00 VC:711710 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE ST:[min: 0, max: 10, num_nulls: 0]
ui:                 INT64 SNAPPY DO:0 FPO:29143087 SZ:1469763/1558941/1,06 VC:711710 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE ST:[min: 1, max: 7498053, num_nulls: 0]
nog:                INT32 SNAPPY DO:0 FPO:30612850 SZ:437530/447375/1,02 VC:711710 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE ST:[min: 1, max: 35, num_nulls: 0]
ref:                BOOLEAN SNAPPY DO:0 FPO:31050380 SZ:90402/90222/1,00 VC:711710 ENC:BIT_PACKED,RLE,PLAIN ST:[min: false, max: true, num_nulls: 0]
din:                BOOLEAN SNAPPY DO:0 FPO:31140782 SZ:66058/90224/1,37 VC:711710 ENC:BIT_PACKED,RLE,PLAIN ST:[min: false, max: true, num_nulls: 0]
lu:                 BOOLEAN SNAPPY DO:0 FPO:31206840 SZ:29012/90224/3,11 VC:711710 ENC:BIT_PACKED,RLE,PLAIN ST:[min: false, max: true, num_nulls: 0]
brk:                BOOLEAN SNAPPY DO:0 FPO:31235852 SZ:90401/90221/1,00 VC:711710 ENC:BIT_PACKED,RLE,PLAIN ST:[min: false, max: true, num_nulls: 0]
chind:              INT32 SNAPPY DO:0 FPO:31326253 SZ:2956/3505/1,19 VC:711710 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE ST:[min: <REDACTED>, max: <REDACTED>, num_nulls: 0]

Glue table schema

oid       varchar(40)
mar       integer
ui        bigint
nog       smallint
ref       boolean
din       boolean
lu        boolean
brk       boolean
chind     date
dp        date (partition_key)