dvc: get: fails with head detached and no branches in place

Bug Report

Description

With head detached and no branches available dvc get fails:

ERROR: failed to get 'data/data.xml' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'

UPDATE: Skip to https://github.com/iterative/dvc/issues/6154#issuecomment-866717904

Reproduce

  1. git clone https://github.com/iterative/example-get-started
  2. cd example-get-started
  3. git checkout --detach
  4. git branch -D master
  5. git --no-pager branch -v
  6. dvc get . data/data.xml

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.3.0 (pip)
---------------------------------
Platform: Python 3.9.5 on macOS-10.16-x86_64-i386-64bit
Supports: azure, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git

Additional Information (if any):

Interesting, that dvc pull works fine.

2021-06-11 12:45:41,751 DEBUG: Creating external repo .@None
2021-06-11 12:45:41,751 DEBUG: erepo: git clone '.' to a temporary dir
2021-06-11 12:45:41,966 DEBUG: Removing '/Users/aguschin/Git/iterative/example-get-started2/.gqZZrecSgVYzPDv8JZmpCu'                                          
2021-06-11 12:45:41,966 ERROR: failed to get 'data/data.xml' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/command/get.py", line 37, in _get_file_from_repo
    Repo.get(
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/get.py", line 50, in get
    with external_repo(
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/external_repo.py", line 70, in external_repo
    repo = Repo(**repo_kwargs)
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/__init__.py", line 155, in __init__
    self.root_dir, self.dvc_dir, self.tmp_dir = self._get_repo_dirs(
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/repo/__init__.py", line 103, in _get_repo_dirs
    fs = scm.get_fs(rev) if isinstance(scm, Git) and rev else None
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 353, in get_fs
    resolved = self.resolve_rev(rev)
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 400, in resolve_rev
    return self._resolve_rev(rev)
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/__init__.py", line 343, in _backend_func
    return func(*args, **kwargs)
  File "/Users/aguschin/.local/share/virtualenvs/yolov5-y85dc7P1/lib/python3.9/site-packages/dvc/scm/git/backend/pygit2.py", line 217, in resolve_rev
    raise RevError(f"unknown Git revision '{rev}'")
dvc.scm.base.RevError: unknown Git revision 'refs/remotes/origin/HEAD'
------------------------------------------------------------
2021-06-11 12:45:41,974 DEBUG: Analytics is enabled.
2021-06-11 12:45:42,049 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/_l/xm_1fknn1479pfh9kg0ttgjc0000gq/T/tmpklkkra_a']'
2021-06-11 12:45:42,051 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/_l/xm_1fknn1479pfh9kg0ttgjc0000gq/T/tmpklkkra_a']'

Initially this happened with me in Gitlab CI – it seems like Gitlab fetches the repo without any branches and with head detached:

# output received from debugging this in gitlab CI:

$ git --no-pager branch -v
* (HEAD detached at eae5019) eae5019 update environment

$ dvc get . model
ERROR: failed to get 'model' from '.' - unknown Git revision 'refs/remotes/origin/HEAD'                                                                       

Simple hack works, like running git checkout -b temp-branch first:

git checkout -b temp-branch
dvc get . model

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (19 by maintainers)

Most upvoted comments

@jorgeorpinel The comments in #3590 are specific to list, that change does not apply to get.

This issue is marked P3 because we can potentially change get to not clone the local repo. However, that change will only fix the issue for when the user is trying to get the HEAD revision.

The real problem here is that the user’s local repo (.) will only have the specific revisions fetched into the runner env by gitlab/github CI, so using

dvc get . --rev <any other branch or rev>

won’t work at all in these CI environments.

If the user wants consistent dvc get behavior in these CI envs, they should use the proper git remote URL (and not the local repo .) as noted above.

@sukhovvl your issue is a bug which has already been resolved in main and the fix will be available in the next DVC release (see https://github.com/iterative/dvc/pull/7750)

@dberenbaum I think this is a separate issue. If it was a shallow/depth problem it would likely show up as an unknown git revision/object SHA and not the “no active branch” error.

@sukhovvl can you run your diff command with -v in gitlab CI and then post the full error traceback here

Sorry, it seems I didn’t know how to use this. dvc pull yolov5s.pt.dvc works fine, while dvc pull yolov5s.pt fails:

ERROR: failed to pull data from the cloud - 'yolo5s.pt' does not exist as an output or a stage name in 'dvc.yaml': Stage 'yolo5s.pt' not found inside 'dvc.yaml' file

It should be supported if I remember correctly. I’ll take a look into it tomorrow.

@jorgeorpinel

What change would you suggest here? (What would you expect?)

I was expecting that artefacts related to the commit eae5019 will be downloaded (in this case they are specified in dvc.lock).

push/pull don’t need Git at all. They use the metafiles present in the project (and config file).

Maybe dvc get . should use the same mechanics?

@pmrowla, thank you, I think now it’s the best option. For the record, in gitlab CI this will look like

dvc get $CI_PROJECT_URL --rev $CI_COMMIT_SHA path-to-file

dvc.scm.base.RevError: unknown Git revision ‘refs/remotes/origin/HEAD’

I think that get clones the source repo (even if it’s .) and requires a --rev which is by default HEAD. The workaround is to use dvc get . data/data.xml --rev 00071e8 instead.

I think this is a bug, the issue here is that the repo being cloned does have a HEAD, but it is detached, meaning it points to a specific revision and not to a branch. The git remote-tracking refs (like refs/remotes/origin/HEAD) are used locally to keep “copies” of refs in the cloned repo, but apparently refs/remotes/origin/HEAD isn’t created when you clone a repo that is in a detached state.

This is a pretty specific edge case though, and I think cloning a repo with no branches may still cause some other issues in DVC anyways, even if we fix the origin/HEAD issue.

We maybe need to rethink how dvc get for local repos should work (and just skip the clone entirely), similar to how we should handle dvc list with local repos without cloning as well (https://github.com/iterative/dvc/issues/3590)