dvc: get: fails with head detached and no branches in place
Bug Report
Description
With head detached and no branches available dvc get
fails:
ERROR: failed to get 'data/data.xml' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'
UPDATE: Skip to https://github.com/iterative/dvc/issues/6154#issuecomment-866717904
Reproduce
- git clone https://github.com/iterative/example-get-started
- cd example-get-started
- git checkout --detach
- git branch -D master
- git --no-pager branch -v
- dvc get . data/data.xml
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.3.0 (pip)
---------------------------------
Platform: Python 3.9.5 on macOS-10.16-x86_64-i386-64bit
Supports: azure, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git
Additional Information (if any):
Interesting, that dvc pull works fine.
2021-06-11 12:45:41,751 DEBUG: Creating external repo .@None
2021-06-11 12:45:41,751 DEBUG: erepo: git clone '.' to a temporary dir
2021-06-11 12:45:41,966 DEBUG: Removing '/Users/aguschin/Git/iterative/example-get-started2/.gqZZrecSgVYzPDv8JZmpCu'
2021-06-11 12:45:41,966 ERROR: failed to get 'data/data.xml' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/command/get.py", line 37, in _get_file_from_repo
Repo.get(
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/get.py", line 50, in get
with external_repo(
File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
return next(self.gen)
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/external_repo.py", line 70, in external_repo
repo = Repo(**repo_kwargs)
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/__init__.py", line 155, in __init__
self.root_dir, self.dvc_dir, self.tmp_dir = self._get_repo_dirs(
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/__init__.py", line 103, in _get_repo_dirs
fs = scm.get_fs(rev) if isinstance(scm, Git) and rev else None
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 353, in get_fs
resolved = self.resolve_rev(rev)
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 400, in resolve_rev
return self._resolve_rev(rev)
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 343, in _backend_func
return func(*args, **kwargs)
File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/backend/pygit2.py", line 217, in resolve_rev
raise RevError(f"unknown Git revision '{rev}'")
dvc.scm.base.RevError: unknown Git revision 'refs/remotes/origin/HEAD'
------------------------------------------------------------
2021-06-11 12:45:41,974 DEBUG: Analytics is enabled.
2021-06-11 12:45:42,049 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/_l/xm_1fknn1479pfh9kg0ttgjc0000gq/T/tmpklkkra_a']'
2021-06-11 12:45:42,051 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/_l/xm_1fknn1479pfh9kg0ttgjc0000gq/T/tmpklkkra_a']'
Initially this happened with me in Gitlab CI – it seems like Gitlab fetches the repo without any branches and with head detached:
# output received from debugging this in gitlab CI:
$ git --no-pager branch -v
* (HEAD detached at eae5019) eae5019 update environment
$ dvc get . model
ERROR: failed to get 'model' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'
Simple hack works, like running git checkout -b temp-branch
first:
git checkout -b temp-branch
dvc get . model
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (19 by maintainers)
@jorgeorpinel The comments in #3590 are specific to
list
, that change does not apply toget
.This issue is marked P3 because we can potentially change
get
to not clone the local repo. However, that change will only fix the issue for when the user is trying toget
the HEAD revision.The real problem here is that the user’s local repo (
.
) will only have the specific revisions fetched into the runner env by gitlab/github CI, so usingwon’t work at all in these CI environments.
If the user wants consistent
dvc get
behavior in these CI envs, they should use the proper git remote URL (and not the local repo.
) as noted above.@sukhovvl your issue is a bug which has already been resolved in
main
and the fix will be available in the next DVC release (see https://github.com/iterative/dvc/pull/7750)@dberenbaum I think this is a separate issue. If it was a shallow/depth problem it would likely show up as an unknown git revision/object SHA and not the “no active branch” error.
@sukhovvl can you run your diff command with
-v
in gitlab CI and then post the full error traceback hereIt should be supported if I remember correctly. I’ll take a look into it tomorrow.
@jorgeorpinel
I was expecting that artefacts related to the commit
eae5019
will be downloaded (in this case they are specified indvc.lock
).Maybe
dvc get .
should use the same mechanics?@pmrowla, thank you, I think now it’s the best option. For the record, in gitlab CI this will look like
I think this is a bug, the issue here is that the repo being cloned does have a
HEAD
, but it is detached, meaning it points to a specific revision and not to a branch. The git remote-tracking refs (likerefs/remotes/origin/HEAD
) are used locally to keep “copies” of refs in the cloned repo, but apparentlyrefs/remotes/origin/HEAD
isn’t created when you clone a repo that is in a detached state.This is a pretty specific edge case though, and I think cloning a repo with no branches may still cause some other issues in DVC anyways, even if we fix the
origin/HEAD
issue.We maybe need to rethink how
dvc get
for local repos should work (and just skip the clone entirely), similar to how we should handledvc list
with local repos without cloning as well (https://github.com/iterative/dvc/issues/3590)