dvc: checkout: consistency in handling files that are missing version info

dvc checkout(and dvc pull since it uses it internally) will error-out if version-info is missing, while dvc push will just warn us, which creates a confusing inconsistency. For example, it makes CML guys use || true for dvc pull https://github.com/DavidGOrtega/cml-dvc-test/runs/2438634469 CC @DavidGOrtega , as they don’t have dvc.lock on initial run.

Current dvc push behaviour with warnings is also quite annoying and doesn’t scale well, as it might get way to noisy for a big pipeline. So we should probably agree on one common behaviour for all such cases that will make sense to our users. CC @dberenbaum

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (20 by maintainers)

Most upvoted comments

related #4746

@karajan1001 This is the scenario as far as I remember:

  • dvc add data and commit .dvc file.
  • dvc stage add and commit dvc.yaml (the stage does not get run).
  • git push and dvc push, which warns the user that version info is missing for the outputs of the dvc.yaml stage.
  • On remote server in CML, git pull and dvc pull, which throws an error because version info is missing for the outputs of the dvc.yaml stage.

EDIT: A much simpler summary is that no dvc.lock file exists.

I don’t have a good idea on why it deletes the file on missing version info. DVC behaves like this in a lot of places, which I’d like to fix (eg: dvc add removing stage files on failure, etc). Pinging @efiop.

Current dvc push behaviour with warnings is also quite annoying and doesn’t scale well, as it might get way to noisy for a big pipeline.

Isn’t this what --quiet is for? It would be great to have some nice summary option that would be a compromise between no output and overly verbose output, but I think that’s separate from the issue here, which is about an inconsistency between push and pull. If I have done dvc stage add ... && dvc push, then dvc pull right after should not fail.