delta-rs: Python write_deltalake() to Non-AWS S3 failing

Environment

Delta-rs version: 0.6.2

Binding: Python

Environment: Docker container: Python: 3.10.7 OS: Debian GNU/Linux 11 (bullseye) S3: Non-AWS (Ceph based)


Bug

What happened:

Delta lake write is failing when trying to write table to Ceph based S3 (non-AWS). I am writing the table to a path which does not contain any delta table or any sort of file previously.

I have also tried different mode but writing the table still does not work and throws the same error.

My code:

storage_options = {"AWS_ACCESS_KEY_ID": f"{credentials.access_key}", 
                   "AWS_SECRET_ACCESS_KEY": f"{credentials.secret_key}",
                   "AWS_ENDPOINT_URL": "https://xxx.yyy.zzz.net"
                  }
df = pd.DataFrame({'x': [1, 2, 3]})
table_uri = "s3a://<bucket-name>/delta_test"
dl.writer.write_deltalake(table_uri, df, storage_options=storage_options)

Fails with the following error:

image

Any idea what might be the problem? I am able to read the delta tables with the same storage_options.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15

Most upvoted comments

The EntityTooSmall error is a bug in the S3 implementation, and it’s triggered if any files in the table are over 5 MB. (It doesn’t seem to happen in the local emulators we use for testing, but does happen in AWS S3.) I have a fix ready in https://github.com/apache/arrow-rs/pull/3234, which will hopefully be included in the next release. Thanks for reporting this!

Hello! I will give it a go, will let you know as soon as possible!

I currently get SignatureDoesNotMatch when providing credentials.