trino: Missing manifest file when releasing Hive Metastore's the lock has failed

HiveMetastoreTableOperations using aquireLock and releaseLock while committing to iceberg existing table. releaseLock happens in the finally block and that step can also fail, and that result into that missing manifest file.

Steps To Reproduce:

CREATE TABLE tbl (              
    id integer,                                    
    name varchar,                                  
    part varchar                                   
 )                                                 
 WITH (                                            
    location = '/path', 
    format = 'PARQUET'
);

INSERT INTO tbl VALUES (1,'product', 'part1');

if HiveMetastoreTableOperations’s line failed to execute at Metastore level. It can fail for Metastore timeout or if lock is missing from HMS side it can throw NoSuchLockException which is an Exception and not RuntimeException

Then after the failure, insert/select or any query on this table will fail, and table becomes unusable:

INSERT INTO tbl VALUES (2,'product', 'part1');

Query 20220829_214404_00026_j9tvf, FAILED, 3 nodes
Splits: 68 total, 67 done (98.53%)
0.47 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20220829_214404_00026_j9tvf failed: Failed to get status for file: //path/tbl/metadata/snap-5290494244823571353-1-34a427a1-054c-4ecf-a49b-f3e3ef9203a0.avro

Reason would be when the release lock step was failed, the Iceberg’s commit handles only RuntimeException so in case of NoSuchLockException being thrown then it has not done any step. So, looks like the new snapshot is created and pointing to a new manifest file in the previous step, but that file is missing/not created.

related issues: https://github.com/trinodb/trino/issues/14104 https://github.com/trinodb/trino/issues/12581

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (14 by maintainers)

Most upvoted comments

After handling https://github.com/trinodb/trino/issues/14104 , the logic throws now a CommitStateUnknownException when doing the replaceTable operation.

https://github.com/trinodb/trino/blob/15dd728c6d2f865bc9ece9672a32df9314b7a4e7/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java#L91-L96

So the operation which potentially can corrupt the table succeeded.

Regarding the lock release question. Indeed, failing to unlock can generate the unwanted situation where the table gets corrupted as well because the metadata file gets removed.

it’s possible that the table remains locked until the metastore housekeeper eventually runs and cleans out the stale lock (which could be up to 5 minutes IIRC)

In the absence of a more fine-grained mechanism of handling this kind of failure in iceberg we probably need to swallow the exception in order to avoid removing a metadata file which is already commited to the metastore.

https://github.com/apache/iceberg/blob/b9bbcfb0e4d8f2605d3c6fb2543cefdd5d09524d/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L406-L414

@marton-bod do you want in the near future to put up a PR ?

cc @electrum

Thanks for reporting @osscm! I have managed to reproduce this locally and have prepared a fix, for which I will open a PR shortly.

Incidentally, this same issue has already come up in the Iceberg hive-metastore module and was fixed accordingly by PR. Here’s what the code fragment currently looks like.

I would propose to keep it simple and keep it in sync with the iceberg hive-metastore connector solution.
Please let me know what you think.