delta-rs: _internal.DeltaError when merging
Environment
Delta-rs version: 0.15.1
Binding: Python
Environment:
- Cloud provider: Azure (deltatable stored in Azure)
- OS: Linux/OpenShift
Bug
What happened: I try to merge data into an existing deltatable like so:
deltatable.merge(
mergeable_table,
f"target._merge_key = source._merge_key",
source_alias="source",
target_alias="target",
)
.when_matched_update_all()
.when_not_matched_insert_all()
.execute()
This sometimes works, and sometimes fails with the following error:
_internal.DeltaError: Generic DeltaTable error: Schema error: No field named target._merge_key. Valid fields are _merge_key, <other camel-case columns>
What you expected to happen:
I expect the merge to succeed. The error claims that target._merge_key is not a valid field name, but _merge_key is, which is a bit strange.
How to reproduce it: I have had trouble reproducing this locally and/or reliably. The merge operations are run as part of a batch job which begins with
deltatable.optimize.compact()
deltatable.vacuum(dry_run=False, retention_hours=48, enforce_retention_duration=False)
deltatable.cleanup_metadata()
followed by a number of merge operations. If the first merge fails, all subsequent merge operations also fail. Sometimes the first merge succeeds, and then all subsequent merges also seem to succeed. So I wonder if any of these optimize/vacuum/cleanup methods could (sometimes) corrupt the table? (Maybe relevant: Are these operations run synchronously or asynchronously?)
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 20 (4 by maintainers)
I’m using merge
I have the same problem! Multiple writes to a table partitioned by one column. The first write is fine, subsequent writes throw DeltaError exception and the _delta_log directory contains several _commit_xxxxx.json.tmp files. Data is saved to S3 but Delta Lake versions are not updated.