delta-rs: Not able to access Azure Delta Lake

Discussed in https://github.com/delta-io/delta-rs/discussions/599

<div type='discussions-op-text'>

Originally posted by ganesh-gawande May 9, 2022 Hi,

I am using the documentation - https://github.com/delta-io/delta-rs/blob/main/docs/ADLSGen2-HOWTO.md I tried many version of paths - but not able to access the Delta lake.

Following error received - Not a Delta table: No snapshot or version 0 found OR Invalid object URI

Here are the paths I have tried in my code but nothing works.

delta = DeltaTable("adls2://{ContainerName}@{StorageAccountName}.dfs.core.windows.net")
delta = DeltaTable("adls2://{StorageAccountName}/{ContainerName}/{Folder1}/{Folder2}/{FileName}.parquet")
delta = DeltaTable("adls2://{StorageAccountName}/{ContainerName}/{DeltaTableNameFromDatabricks}")
delta = DeltaTable("adls2://{StorageAccountName}/{ContainerName}/")
delta = DeltaTable("adls2://{ContainerName}@{StorageAccountName}.dfs.core.windows.net/{ContainerName}/{DeltaTableNameFromDatabricks}")

delta = DeltaTable("abfss://{ContainerName}@{StorageAccountName}.dfs.core.windows.net/{ContainerName}/{DeltaTableNameFromDatabricks}")
delta = DeltaTable("abfss://{ContainerName}@{StorageAccountName}.dfs.core.windows.net/{ContainerName}/")
delta = DeltaTable("abfss://{ContainerName}@{StorageAccountName}.dfs.core.windows.net/")
</div>

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 58

Most upvoted comments

@roeap _ I confirmed that the issue reported above is resolved in version 0.7.0. I am able to connect Azure Storage account with the change in path with az://{containerName}/path and with storage options parameter.

I am using the release version which I installed via pip. @roeap . Alright will be awaiting this feature in the next release I suppose.

@ganesh-gawande - so the path you should be using is adls2://{StorageAccountName}/{ContainerName}/. After #603 is merged adls2://{StorageAccountName}/{ContainerName} should also work.

However I also tried loading a delta log with initial commit files remove, which only work if there is a _last_commit file present. When that file is missing we see the exact error message you encountered.

@wjones127 @houqp - I do remember the protocol explicitly mentioning lexicographical sort to work with the log. Should we implement that logic, or make sure first that delta needs to support finding checkpoints w/o that file. or are we already sure 😃.

I guess the core logic from loading a specific version can already largely be reused. Likely we would also want to mirror the logic in our writers to create a checkpoint every ten commits.

In any case, the 00000000000000000000.json file not existing might already be considered a corrupt delta log, even though it should work as long as as checkpoint files exist with all relevant information.

@roeap Actually, the first delta entry is not guaranteed to exist. See my update in https://github.com/delta-io/delta/pull/913

Not sure if we are testing that in this repo though.

hmm strange … this seems like a corruption in the delta log to me… when databricks creates a checkpoint it should also create a _last_checkpoint file. The rust implementation relies on either identifying the latest checkpoint via that file or starting from the beginning.

One way to load the table could be to use the load_version function i.e. table.load_version(85996). Looking at the delta sepcification really quick it seems to me this scenario (i.e. parts of the log missing) is not something a reader needs to support, but would be resilient to if the last checkpoint file exists.

If you use the load_version command mentioned above we search for the closest checkpoint with a lower or equal version and that ā€œshouldā€ work. So it should work with any version higher then that checkpoints version. THe reasoing for all that logic is tables exactly like yours, where listing a directory with 10s of thousands of files becomes prohibitively expensive…

I’d be interested to know if databricks is able to load that table without specifying a specific version.