dvc: Pull / Push Failed for azure remote storage - Authentification issue with DVC 2.0.X

Bug Report

Pull / Push Failed for azure remote storage - Authentification issue with DVC 2.0.X

When using classical commands (push, pul…) with azure remote storage there is the following error :

ERROR: failed to pull data from the cloud - Authentication to Azure Blob Storage via connection string failed.

Description

Prerequisite :

dvc init on git repo and several azure remote storage configured using :

dvc remote add myremote1 "azure://<CONTAINER_NAME>/DVC"
dvc remote modify myremote1  connection_string  "BlobEndpoint=<URL_STORAGE;SharedAccessSignature=<SAS_KEY>" 

Repo already contained tracked data (.dvc files already there).

Reproduce

  1. git pull
  2. dvc pull -r myremote1

Expected

+-------------------------------------------+
|                                           |
|     Update available 1.11.16 -> 2.0.1     |
|      Run `pip install dvc --upgrade`      |
|                                           |
+-------------------------------------------+

A       data/models-3/                                                                                                                       
A       data/training-set/                                                                                                                 
A       data/models/                                                                                                                       
A       data/dvc-training-set/                                                                                                             
4 files added and 174 files fetched 

Environment information

DVC version: 2.0.3 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-1039-azure-x86_64-with-glibc2.29
Supports: azure, http, https
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure, azure, azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git

Output of dvc doctor:

$ dvc doctor
DVC version: 2.0.3 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-1039-azure-x86_64-with-glibc2.29
Supports: azure, http, https
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure, azure, azure
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git

Additional Information (if any):

When installing previous version (1.11.16) it works.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (10 by maintainers)

Most upvoted comments

Well that would be my first pull request then. Let me give that a try in the morning!

Turns out, I was using root of the azure container with container-level credentials. Adding a subpath like azure://<container>/<subpath> in .dvc/config seems to have solved the issue. This is probably due to adlfs find() implementation, which was mentioned above.

Thanks for triaging the error @jpvlerbe, it makes sense for that line to violate container-level priviliged tokens. I think it is going to better for us to just do a try/except to check the container’s existence rather than listing all containers and checking whether if the name is present in that list or not. If you’d like to submit a pull request, would be amazing! (a quick suggestion of mine would be replacing the if condition there with a try except statement that calls fs.info(bucket_name) and if it fails with FileNotFoundError executes the mkdir())

I think I have figured it out. So in Azure Blob Storage you have the Account and in that account you create containers. SAS tokens can be limited to containers.

In configuring azure in dvc I have done (as per the documentation) dvc remote add -d myremote azure://mycontainer/path dvc remote modify --local myremote connection_string 'mystring'

Checking the code and testing in an interactive python session I can replicate the error when doing: from adlfs import AzureBlobFileSystem fs = AzureBlobFileSystem(connection_string=mystring) fs.ls("/") At this point no reference has been made to the container and I think it tries to list all containers. Which I indeed have no access to.

When I replace the last line with fs.ls("<mycontainer>") it succesfully lists the content of that container. So the issue is in line 116 of dvc/fs/azure.py

Can you run pip freeze and post the result here (in the working environment with dvcv2 and in the broken environment with dvcv2). It seems like a problem regarding with one of our dependencies

@rbreunev any updates?

Ok just update to 2.0.5 and seems to solve the issue. I am waiting to have confirmation of all my users to close the bug.