iceberg: Spark: Extend expire_snapshots procedure with an optional arg for snapshot ids

I think we should extend our expire_snapshots procedure to also accept an optional list of snapshots ids. That way we match the behavior in the Table API. It may be useful when someone generated a lot of compaction snapshots and wants to expire them even though they are recent.

  private static final ProcedureParameter[] PARAMETERS = new ProcedureParameter[] {
      ProcedureParameter.required("table", DataTypes.StringType),
      ProcedureParameter.optional("older_than", DataTypes.TimestampType),
      ProcedureParameter.optional("retain_last", DataTypes.IntegerType),
      // new optional parameter to specify snapshot ids to expire
      ProcedureParameter.optional("snapshots_ids", DataTypes.createArrayType(DataTypes.LongType))
  };

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 19 (18 by maintainers)

Most upvoted comments

@Neuw84, I think we switched the approach used in the table API to leverage a reachability set. I assume it should be safe.