dvc: add --external: fails using Azure remote
Bug Report
Description
I am trying to track existing data from a storage account in Azure following current documentation.
Reproduce
- dvc init
- dvc remote add azcore azure://core-container
- dvc remote add azdata azure://data-container
- dvc add --external remote://azdata/existing-data
Expected
I’m not sure what is expected but the output is:
ERROR: unexpected error - : 'azure'
Environment information
Output of dvc doctor
:
DVC version: 2.38.1 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
Subprojects:
dvc_data = 0.28.4
dvc_objects = 0.14.0
dvc_render = 0.0.15
dvc_task = 0.1.8
dvclive = 1.3.1
scmrepo = 0.1.4
Supports:
azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: azure, azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git
Additional Information:
2023-01-04 18:58:46,616 ERROR: unexpected error - : 'azure'
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 65, in __getattr__
return self._odb[name]
KeyError: 'azure'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/commands/add.py", line 53, in run
self.repo.add(
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/utils/collections.py", line 164, in inner
result = func(*ba.args, **ba.kwargs)
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
return method(repo, *args, **kw)
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/add.py", line 190, in add
stage.save(merge_versioned=True)
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 469, in save
self.save_outs(
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 512, in save_outs
out.save()
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 643, in save
self.odb,
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 450, in odb
odb = getattr(self.repo.odb, odb_name)
File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 67, in __getattr__
raise AttributeError from exc
AttributeError
------------------------------------------------------------
2023-01-04 18:58:46,711 DEBUG: Version info for developers:
DVC version: 2.38.1 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
Subprojects:
dvc_data = 0.28.4
dvc_objects = 0.14.0
dvc_render = 0.0.15
dvc_task = 0.1.8
dvclive = 1.3.1
scmrepo = 0.1.4
Supports:
azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: azure, azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-01-04 18:58:46,714 DEBUG: Analytics is enabled.
2023-01-04 18:58:46,911 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'
2023-01-04 18:58:46,913 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (13 by maintainers)
Thanks for the input guys, it helped me settle the pipeline that we will use and work around some of the limitations using Azure. I think it makes sense to keep this issue open as a feature request but of course I’ll leave that at your discretion.
No, @rmlopes . Thanks, glad to see that we’ve settled on something after all 😃
I think you can overcome this by introducing an extra stage, that would list files into a “list.txt” and make stage that downloads them locally depend on this list. If storage is append-only, immutable I would even prefer this way over import-url since it should be faster.
The bug still should be fixed though, for the import-url downloading things every time (cc @dberenbaum )?
No, as far as I understand it won’t help. Let’s imagine multiple people want to train something simultaneously, since Azure expects a specific layout in the folder and it’s the same one folder, I can’t find way in my head how to make two different splits simultaneously in the same location. A better way would an output folder on Azure a param (and you can use
dvc.yaml
templating to substitute is with a value when the pipeline is running) that person can specify inparams.yaml
? You then should be prepared that you will end up with many folders on Azure storage with different splits (they can be removed after training is done btw).Let me know if that makes sense, I can explain better or show an example if needed.
Yes, it’s clear now what’s going on. Thanks 🙏
@dberenbaum yes, enabling blob versioning is a possibility (I don’t see anything against it)
I looked into the documents and didn’t find the azure support for the external data, Looks like we hadn’t implemented it for now. So it’s a new feature request instead of a bug? @dberenbaum