dvc: dvc push doesn't recognise that files are missing in remote storage

For some reason, I was missing a number of files in my remote storage on GCS. When I run dvc pull, it fails with the following error:

ERROR: failed to download 'gs://redacted/dvc/redacted/00/642ad56326eb0b6caf3784810a49a0' to '../.dvc/cache/00/642ad56326eb0b6caf3784810a49a0' - 'NoneType' object has no attribute 'size'

In order to fix this, I decide to run dvc add and dvc push on a machine which still has these files. The commands run fine, and dvc push reports that everything is fine. However, in reality, it does not upload the missing files to the remote cache. In the end, I had to solve this by manually uploading these files to my remote storage in GCS.

Bug Report

Please provide information about your setup

Output of dvc version:

$ dvc version 
DVC version: 1.1.6
Python version: 3.7.6
Platform: Linux-4.9.0-12-amd64-x86_64-with-debian-9.12
Binary: False
Package: pip
Supported remotes: gs, hdfs, http, https, ssh
Cache: reflink - not supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('ext4', '/dev/sda1')
Repo: dvc, git
Filesystem type (workspace): ('ext4', '/dev/sda1')

dvc_push_log.txt

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (12 by maintainers)

Most upvoted comments

Ran into the same issue atekoa describes well.

By the way, we deleted the file f9c8def4b2a1a6b783209d933e26a6.dir from the remote and this time the missing files were uploaded, but now we have the doubt if it has happened to us in other projects.

Resorted to doing this as well for our s3 remote and then pushing the files again worked. Something like a force push with a -f or --force flag would’ve been fantastic!

@atekoa this is not currently planned, but as @efiop noted previously it would be good for us to have this flag to allow force pushing directories. I’ve created a separate issue that you can follow to keep track for further updates on this

@efiop Hmm, nope, that didn’t change anything… It is easy for me to replicate this. I just need to delete a file in the remote storage after I have pushed it.

I didn’t run gc but someone else may have (I have warned everyone about running gc on the remote). I’m not really sure yet what caused it but I’m wondering if there are other missing files as well.