delta-rs: pyo3_runtime.PanicException: not stream while reading DeltaTable

Environment

Delta-rs version: 0.6.1

Binding: python

Environment:

  • Cloud provider: AWS
  • OS: Linux Mint

Bug

What happened: Reading a delta table with:

DeltaTable( table_uri=self.path_to_table, storage_options={"AWS_REGION": aws_region, "AWS_ENDPOINT_URL": f"s3.{aws_region}.amazonaws.com", "AWS_ACCESS_KEY_ID": f"{aws_access_key_id}", "AWS_SECRET_ACCESS_KEY": f"{aws_secret_access_key}"} )

yields to the following error:

thread  '<unnamed>' panicked at 'not stream', /root/.cargo/git/checkouts/arrow-rs-25d656fcab36794a/5f56754/object_store/src/aws/credential.rs:170:14
stack backtrace:
   0:     0x7f168654043d - <unknown>
   1:     0x7f16865682bc - <unknown>
   2:     0x7f168653b301 - <unknown>
   3:     0x7f1686541bb5 - <unknown>
   4:     0x7f16865418d6 - <unknown>
   5:     0x7f1686542146 - <unknown>
   6:     0x7f1686542037 - <unknown>
   7:     0x7f16865408f4 - <unknown>
   8:     0x7f1686541d69 - <unknown>
   9:     0x7f16859da073 - <unknown>
  10:     0x7f1686565021 - <unknown>
  11:     0x7f1686564fcb - <unknown>
  12:     0x7f16859d9ee6 - <unknown>
  13:     0x7f1685e383e3 - <unknown>
  14:     0x7f1685e5ad2b - <unknown>
  15:     0x7f1685e5f927 - <unknown>
  16:     0x7f1685b1a192 - <unknown>
  17:     0x7f1685b24980 - <unknown>
  18:     0x7f1685a05e16 - <unknown>
  19:     0x7f1685a0703c - <unknown>
  20:     0x7f1685a06627 - <unknown>
  21:     0x7f1685a56b2e - <unknown>
  22:     0x7f1685a2d33d - <unknown>
  23:     0x7f1685a3aad4 - <unknown>
  24:     0x7f1685a6492d - <unknown>
  25:     0x7f1685a73c92 - <unknown>
  26:     0x7f16859f1bcd - <unknown>
  27:     0x7f1685a79972 - <unknown>
  28:           0x5f3d03 - _PyObject_MakeTpCall
  29:           0x570af9 - _PyEval_EvalFrameDefault
  30:           0x56939a - _PyEval_EvalCodeWithName
  31:           0x5f6a13 - _PyFunction_Vectorcall
  32:           0x59bfb7 - <unknown>
  33:           0x5f3d7f - _PyObject_MakeTpCall
  34:           0x570af9 - _PyEval_EvalFrameDefault
  35:           0x56939a - _PyEval_EvalCodeWithName
  36:           0x68d047 - PyEval_EvalCode
  37:           0x67e351 - <unknown>
  38:           0x67e3cf - <unknown>
  39:           0x67e471 - <unknown>
  40:           0x67e817 - PyRun_SimpleFileExFlags
  41:           0x6b6fe2 - Py_RunMain
  42:           0x6b736d - Py_BytesMain
  43:     0x7f16879640b3 - __libc_start_main
  44:           0x5fa5ce - _start
  45:                0x0 - <unknown>

Traceback (most recent call last):
  File "test.py", line 15, in <module>
    dt = DeltaTable(table_uri=path,
  File "<local_path>/python3.8/site-packages/deltalake/table.py", line 91, in __init__
    self._table = RawDeltaTable(
pyo3_runtime.PanicException: not stream

What you expected to happen: Read delta table successfully

How to reproduce it: see what happened

More details:

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

I resolved this issue by defining my local minio in K8S like:

apiVersion: v1
kind: Service
metadata:
  name: trino-minio-svc
  namespace: trino
spec:
  type: NodePort
  ports:
    - name: "9000"
      port: 9000
      targetPort: 9000
      nodePort: 30000
    - name: "9001"
      port: 9001
      targetPort: 9001
      nodePort: 30001
  selector:
    app: minio

I then simply used:

    @staticmethod
    def _setup_storage_options(aws_region: str) -> Dict[str, str]:
        os.environ["AWS_S3_ALLOW_UNSAFE_RENAME"] = "true"
        os.environ["AWS_STORAGE_ALLOW_HTTP"] = "1"

        return {
            "AWS_ACCESS_KEY_ID": os.environ["AWS_ACCESS_KEY_ID"],
            "AWS_SECRET_ACCESS_KEY": os.environ["AWS_SECRET_ACCESS_KEY"],
            "AWS_REGION": aws_region,
            "AWS_ENDPOINT_URL": "http://localhost:30000",
        }

And this worked. It was also important to use port 9000 -> 30000 since this is MinIO’s API port. 9001 -> 30001 is the UI port.

Hi @roeap , thanks for your reply. So, I tried to launch the code twice, the first one changing the url scheme to "https://s3.<region>.amazonaws.com" and the second one removing the AWS_ENDPOINT_URL. In both cases I receive the following error:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    dt = DeltaTable(table_uri=path,
  File "<local_path>/lib/python3.8/site-packages/deltalake/table.py", line 91, in __init__
    self._table = RawDeltaTable(
deltalake.PyDeltaTableError: Failed to read delta log object: Generic S3 error: Error performing get request <correct_s3_path>/_delta_log/00000000000000000000.json: response error "No Body", after 0 retries: HTTP status client error (403 Forbidden) for url (https://s3.<aws_region>.amazonaws.com/<correct_s3_path>/_delta_log/00000000000000000000.json)