dvc: import: does not work with repositories using params

Importing some data from parametrized repository does not work.

reproduction:

#!/bin/bash

set -ex

pushd $TMPDIR

wsp=test_wspace
imp_repo=import_repo
rep=test_repo

rm -rf $wsp && mkdir $wsp && pushd $wsp
base=$(pwd)
mkdir $base/$imp_repo
mkdir $base/$rep

pushd $base/$imp_repo
git init
dvc init

echo data >> data
echo "m: 1" >> params.yaml

dvc add data
dvc run -d data -p m -o out -n train cp data out
git add -A
git commit -am "init"

pushd $base/$rep
git init
dvc init

dvc import $base/$imp_repo out -o my_out -v

Result:

dvc.parsing.ResolveError: failed to parse 'vars' in 'dvc.yaml': 'params.yaml' does not exist

On the first sight the culprit seems to be dvc/repo/stage.py::_maybe_collect_from_dvc_yaml which passes dvc.yaml to StageLoad.load_all, while it seems that it should be a path to the file. https://github.com/iterative/dvc/blob/d882e21b9b013f69abff960a8bad62d822024b59/dvc/repo/stage.py#L66

Kudos to @oslobowl for discovering the problem.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 7
  • Comments: 24 (12 by maintainers)

Most upvoted comments

@mnrozhkov downgrading to 2.8.1 seems to help, we are trying to pinpoint the exact change introducing the bug.

Looks like the issue is how we use wdir instead of git repo’s root, which happens here: https://github.com/iterative/dvc/blob/d882e21b9b013f69abff960a8bad62d822024b59/dvc/parsing/context.py#L402

This is definitely a serious bug, and may need serious consideration on how to fix on filesystem side. Assigning it myself.

@efiop Hi again πŸ˜ƒ So, a colleague of mine figured out that we forgot to use the --upgrade flag in the pip install you posted above. Now everything works πŸ˜ƒ Thank you very much for helping! Have a great weekend!

@efiop Thanks for the explanation Ruslan πŸ˜ƒ we are cheering for you and we are very grateful for the great support we get with issues πŸ˜ƒ

I tried what you said, and I still get the same error unfortunately:

(.venv) asdf@1234:/workspaces/my_pipeline$ dvc update -v data/output.gz.parquet.dvc 
2022-06-09 18:32:09,558 DEBUG: Creating external repo https://github.com/me/my_repo.git@None
2022-06-09 18:32:09,558 DEBUG: erepo: git clone 'https://github.com/me/my_repo.git' to a temporary dir
2022-06-09 18:32:09,840 ERROR: failed update data - Failed to clone repo 'https://github.com/me/my_repo.git' to '/tmp/tmppnijk_gydvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 196, in clone
    repo = clone_from()
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/porcelain.py", line 443, in clone
    return client.clone(
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/client.py", line 747, in clone
    result = self.fetch(path, target, progress=progress, depth=depth)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/client.py", line 824, in fetch
    result = self.fetch_pack(
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/client.py", line 2079, in fetch_pack
    refs, server_capabilities, url = self._discover_references(
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/client.py", line 1938, in _discover_references
    resp, read = self._http_request(url, headers)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dulwich/client.py", line 2219, in _http_request
    raise HTTPUnauthorized(resp.getheader("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/scm.py", line 145, in clone
    git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 143, in clone
    backend.clone(url, to_path, **kwargs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 199, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://github.com/me/my_repo.git' to '/tmp/tmppnijk_gydvc-clone'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/commands/update.py", line 16, in run
    self.repo.update(
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/repo/update.py", line 34, in update
    stage.update(rev, to_remote=to_remote, remote=remote, jobs=jobs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/stage/__init__.py", line 439, in update
    update_import(
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/stage/imports.py", line 21, in update_import
    stage.deps[0].update(rev=rev)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/dependency/repo.py", line 85, in update
    with self._make_repo(locked=False) as repo:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/external_repo.py", line 39, in external_repo
    path = _cached_clone(url, rev, for_write=for_write)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/external_repo.py", line 169, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/funcy/flow.py", line 274, in wrap_with
    return call()
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/external_repo.py", line 239, in _clone_default_branch
    git = clone(url, clone_path)
  File "/workspaces/my_pipeline/.venv/lib/python3.8/site-packages/dvc/scm.py", line 150, in clone
    raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://github.com/me/my_repo.git' to '/tmp/tmppnijk_gydvc-clone'
------------------------------------------------------------
2022-06-09 18:32:09,865 DEBUG: Analytics is enabled.
2022-06-09 18:32:09,905 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpxn7_j_t9']'
2022-06-09 18:32:09,907 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpxn7_j_t9']'

@efiop Yes, sorry. Using 2.11.0 I just get another error: ERROR: failed update data - Failed to clone repo β€˜https://github.com/me/my_pipeline.git’ to β€˜/tmp/tmpb9j9yyrudvc-clone’

bisect points me to e8e1c76504895f846ef5724c02f5414f4a251f7f as being the culprit.