dvc: Not able to push data of dependencies to the remote

Bug Report

Description

I’m not able to push data of dependencies in the dvc.yaml to the remote.

Reproduce

…/dvc.yaml Screenshot 2021-06-30 at 15 04 37

$ dvc repro
$ dcv add ../../data/my_data.csv
$ dvc push ../../data/my_data.csv

Error: failed to push data to the cloud - ‘…/…/data/my_data.csv’ does not exist as an output or a stage name in ‘dvc.yaml’: Stage ‘…/…/data/my_data.csv’ not found inside ‘dvc.yaml’ file

Expected

my_data.csv is uploaded to the cloud successfully.

Environment information

  • dvc 2.4.3

Output of dvc doctor:

DVC version: 2.4.3 (conda)
---------------------------------
Platform: Python 3.8.10 on macOS-10.15.3-x86_64-i386-64bit
Supports: http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5
Caches: local
Remotes: local
Workspace directory: apfs on /dev/disk1s5
Repo: dvc, git

Additional Information (if any):

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@Christoph-1 using the rules I suggested, my_data.csv will be ignored by the first rule data/**.

The subdirectory exclusion !data/**/ only applies to the subdirectory paths (which end in a trailing slash), and essentially just forces git to traverse into subdirectories (so that it can see the .dvc files. All files inside subdirectories will still be ignored due to the first rule.

data/folder/my_data.csv does not match !data/**/ since my_data.csv is not a directory.

So the way the rules work together is:

# ignore every file inside data/
data/**

# force traversal into subdirectories within data/ that would normally be skipped entirely due to the first rule
!data/**/

# un-ignore .dvc files inside data/
!data/**/*.dvc

Another way to think about it would be that these rules are equivalent to the following for data/folder/:

# ignore the contents of data (non-recursive)
data/*

# un-ignore data/folder/
!data/folder/

# ignore the contents of data/folder/ (non-recursive)
data/folder/*

# un-ignore .dvc files within data/folder/
!data/folder/*.dvc

# ... continue repeating this pattern for subdirs inside data/folder/

You can verify this behavior yourself using git check-ignore

$ tree
.
└── data
    ├── foo
    ├── foo.dvc
    └── subdir
        ├── bar
        └── bar.dvc

2 directories, 4 files

$ cat .gitignore
/data/**
!/data/**/
!**/*.dvc

$ git check-ignore data/foo data/foo.dvc data/subdir/bar data/subdir/bar.dvc
data/foo
data/subdir/bar

You can see that only .dvc files are excluded by these rules. My data file paths (foo and bar) remain ignored by the first rule.

@Christoph-1 to properly exclude your .dvc files you will need something like

data/**
!data/**/
!data/**/*.dvc

The issue is that git will not traverse into subdirectories of an ignored dir unless the subdirectory itself is also explicitly excluded with a ! rule. So in your example, git won’t traverse into data/folder at all, since it is ignored by data/*, and the !data/folder/my_data.csv.dvc exclusion will never be considered.

We have a .gitignore that ignores the /data folder

@Christoph-1 can you post the contents of this .gitignore? Exclusions for files inside git-ignored directories have to be written in a specific way, so depending on how you are currently writing the exclusion for my_data.csv.dvc it’s possible that git is still ignoring it.