delta-rs: Python write_deltalake to S3 fails to write due to "invalid json"
Environment
Delta-rs version: 0.6.2
Binding: Python
Environment: Ubuntu 22.04, Python 3.10, deltalake==0.6.2, Running against non-AWS S3 (Swift)
Bug
What happened: DeltaLake write fails.
My test code to write ‘df’ (a pandas dataframe) to an S3 location:
storage_options = {"AWS_ACCESS_KEY_ID": ACCESS_KEY, "AWS_SECRET_ACCESS_KEY":SECRET_KEY, "AWS_ENDPOINT_URL": ENDPOINT_URL, "AWS_REGION": 'us-east-1'}
write_deltalake('s3://joshuarobinson/test_deltalake/', df, storage_options=storage_options)
fails with the following error:
Traceback (most recent call last):
File "/delta_write.py", line 19, in <module>
write_deltalake('s3://joshuarobinson/test_deltalake/', df, storage_options=storage_options)
File "/usr/local/lib/python3.10/site-packages/deltalake/writer.py", line 156, in write_deltalake
table = try_get_deltatable(table_or_uri)
File "/usr/local/lib/python3.10/site-packages/deltalake/writer.py", line 332, in try_get_deltatable
return DeltaTable(table_uri)
File "/usr/local/lib/python3.10/site-packages/deltalake/table.py", line 91, in __init__
self._table = RawDeltaTable(
deltalake.PyDeltaTableError: Failed to load checkpoint: Invalid JSON in checkpoint: expected value at line 1 column 1
Note that the destination path is empty, i.e., I’m writing a brand new table
$ s5cmd ls s3://joshuarobinson/test_deltalake/
ERROR "ls s3://joshuarobinson/test_deltalake/": no object found
Also tried:
- I have tested with all four values of “mode” and had the same result.
- also tried to manually build a pyarrow filesystem and pass that but did not work.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22
@shazamkash Did you read the error message from the write?
The writer tried to make the table, but couldn’t complete the commit. That is why there is a
tmpfile. This error message is intentional.If you add
AWS_S3_ALLOW_UNSAFE_RENAME=true(either as an environment variable or instorage_options), it should write successfully.I currently get
SignatureDoesNotMatch, atm when providing credentials.When doing
@joshuarobinson I have the same issue.
Looking at the code, it currently expects the table to already exist:
write_deltalakeperforms:When
try_get_deltatableis called, it then callsDeltaTable.It seems like it wants
storage_optionsto initialise a new delta table but currently it does not pass it through, even if you send it withwrite_deltatable.Strange behaviour indeed.