iceberg: Spark: SparkSQL call procedures blocking(expire_snapshots and delete orphan files)
Spark version: 3.2.1 or 3.1.2
Spark SQL statements:
CALL hive_prod.system.expire_snapshots(table => 'test.iceberg_test_col_data_with_dt_02', older_than => timestamp '2022-05-18 19:02:00.595',retain_last => 1);
CALL hive_prod.system.remove_orphan_files(table => 'test.iceberg_test_col_data_with_dt_02', dry_run => true);
Execute statement may occour spark task blocking(last task of collectAsList operator)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (1 by maintainers)
我解决掉了这个问题,用local模式跑会阻塞,spark on yarn是正常的 如果本地模式要解决阻塞有两种方式,1种改spark源码,1种改iceberg的源码,改spark源码的方式比较通用解决这个阻塞问题 改一下org.apache.spark.scheduler.TaskSetManager#addPendingTask这个方法,把spark task任务优先级这段注释一下就可以了