dvc: Failure to pull after importing a git repo

Please provide information about your setup DVC version(i.e. dvc --version), Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))

DVC 0.86.4, conda installation, Windows.

See https://discuss.dvc.org/t/importing-from-a-git-repo-then-pulling/320 for background.

After importing a directory from a git repository (not a dvc repository), dvc pull fails with ERROR: unexpected error - 'ExternalGitRepo' object has no attribute 'cache'.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Works like a charm

@charlesbaynham, #3503 implemented exactly how you described. dvc pull will fetch using specified rev_lock. dvc import can be used to import (or, forced re-import), and update will update rev_lock if rev moves. We have update --rev to move to a different revision.

@charlesbaynham it actually does put it in the cache. Continuing your script:

$ cat a_dir.dvc
...
  repo:
    url: ../git_repo
    rev_lock: 314393c2a7b82004302d5b7f219e724522db88ca
outs:
- md5: 59446444eae8d3c063f64fefd37b43b5
...
$ tree .dvc/cache 
.dvc/cache
├── 59
│   └── 446444eae8d3c063f64fefd37b43b5.dir
└── d4
    └── 1d8cd98f00b204e9800998ecf8427e
$ cat .dvc/cache/59/446444eae8d3c063f64fefd37b43b5.dir 
[{"md5": "d41d8cd98f00b204e9800998ecf8427e", "relpath": "test.txt"}]

But shouldn’t push it to the remote storage, indeed.

@skshetry in this case, I don’t understand why dvc pull is even trying to pull from this import stage, since the output is already in cache ^ (and workspace).

image

https://dvc.org/doc/command-reference/pull

I think that’s the current p0 bug.

  • Whether there are other problems with Git-tracked imported files, we would need to extensively Q/A re-importing and updating them… And probably dvc fetch\pull and dvc push as well, just in case.

Thanks for the bug report @charlesbaynham ! We’ll take a look ASAP.

@skshetry @efiop Fantastic, thank you! I’ll give it a try.

Able to reproduce with following tests:

def test_pull_imports(tmp_dir, dvc, scm, git_dir):
    with git_dir.chdir():
        git_dir.scm_gen("foo", "foo", commit="first version")

    dvc.imp(fspath(git_dir), "foo")
    dvc.pull()

def test_pull_imports_erepo_dvc(tmp_dir, dvc, scm, erepo_dir):
    with erepo_dir.chdir():
        erepo_dir.dvc_gen("foo", "foo", commit="first version")

    dvc.imp(fspath(erepo_dir), "foo")
    dvc.pull()

Looks like it never worked before. dvc import already puts the file in the cache, and dvc update should be used for updating. Nevertheless, I’m trying to see what’s going on.