iceberg: Spark: SparkSQL call procedures blocking(expire_snapshots and delete orphan files)

Spark version: 3.2.1 or 3.1.2 Spark SQL statements: CALL hive_prod.system.expire_snapshots(table => 'test.iceberg_test_col_data_with_dt_02', older_than => timestamp '2022-05-18 19:02:00.595',retain_last => 1); CALL hive_prod.system.remove_orphan_files(table => 'test.iceberg_test_col_data_with_dt_02', dry_run => true); Execute statement may occour spark task blocking(last task of collectAsList operator)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (1 by maintainers)

Most upvoted comments

我也这个问题了,过渡和删除孤儿文件都在最后一个,10小时反应

我解决掉了这个问题,用local模式跑会阻塞,spark on yarn是正常的 如果本地模式要解决阻塞有两种方式,1种改spark源码,1种改iceberg的源码,改spark源码的方式比较通用解决这个阻塞问题 改一下org.apache.spark.scheduler.TaskSetManager#addPendingTask这个方法,把spark task任务优先级这段注释一下就可以了 image