delta-rs: _internal.DeltaError when merging

Environment

Delta-rs version: 0.15.1

Binding: Python

Environment:

  • Cloud provider: Azure (deltatable stored in Azure)
  • OS: Linux/OpenShift

Bug

What happened: I try to merge data into an existing deltatable like so:

deltatable.merge(
                mergeable_table,
                f"target._merge_key = source._merge_key",
                source_alias="source",
                target_alias="target",
            )
            .when_matched_update_all()
            .when_not_matched_insert_all()
            .execute()

This sometimes works, and sometimes fails with the following error: _internal.DeltaError: Generic DeltaTable error: Schema error: No field named target._merge_key. Valid fields are _merge_key, <other camel-case columns>

What you expected to happen: I expect the merge to succeed. The error claims that target._merge_key is not a valid field name, but _merge_key is, which is a bit strange.

How to reproduce it: I have had trouble reproducing this locally and/or reliably. The merge operations are run as part of a batch job which begins with

deltatable.optimize.compact()
deltatable.vacuum(dry_run=False, retention_hours=48, enforce_retention_duration=False)
deltatable.cleanup_metadata()

followed by a number of merge operations. If the first merge fails, all subsequent merge operations also fail. Sometimes the first merge succeeds, and then all subsequent merges also seem to succeed. So I wonder if any of these optimize/vacuum/cleanup methods could (sometimes) corrupt the table? (Maybe relevant: Are these operations run synchronously or asynchronously?)

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 20 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I’m using merge

I have the same problem! Multiple writes to a table partitioned by one column. The first write is fine, subsequent writes throw DeltaError exception and the _delta_log directory contains several _commit_xxxxx.json.tmp files. Data is saved to S3 but Delta Lake versions are not updated.