dvc: "Error: Failed to pull data from the cloud" when pulled data is a directory

When pulling data from remote storage, I execute the following command: dvc pull train.dvc

with content of the file: train.dvc

cmd: python train.py
deps:
- md5: 454410a0e96e7c268914287865e94bbc
  path: data/dataset.csv
md5: c1b4f121d34f4dbd426089bbbec26d3c
outs:
- cache: true
  md5: 9f1f44cc23e76e794f7423d4a76147a3.dir
  path: outputs/models/

Then obtain the following error:

[##############################] 100% Collecting information
[##############################] 100% outputs/models
[##############################] 100% Collecting information
(1/3): [##############################] 100% outputs/models/task_a.pkl
(2/3): [##############################] 100% outputs/models/task_b.pkl
(3/3): [##############################] 100% outputs/models/task_c.pkl
Checking out ' outputs/models' with cache '9f1f44cc23e76e794f7423d4a76147a3.dir'.
Linking directory 'outputs/models'.
Error: Failed to pull data from the cloud: stat: path should be string, bytes, os.PathLike or integer, not NoneType

It seems that it happens because of the output of train.dvc is a directory. It works fine when itโ€™s a file.

some infos: dvc==0.21.0 installed with pip macOS 10.14

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (7 by maintainers)

Most upvoted comments

@efiop , Seems like the issue might be because unintentionally my Mac system create an empty file called Icon and DVC was tracking that but gdrive failed to track it as it was empty & hence during dvc pull DVC could not pull the Icon file as it was not in my gdrive remote. Let me test this and see if I am correct.

Thanks to @mroutis for a lightning fast โšก fix! 0.21.2 is out, please upgrade. ๐ŸŽ‰

@imflash217 @we-taper Created https://github.com/iterative/dvc/issues/4286 . Letโ€™s move the discussion there. Thanks for the feedback!

@guerrapin Looks like an unrelated issue. Could you please create a new issue for it so we could continue there? Please provide full error too with dvc pull -v.

Got the same the problem. The update fixed it and everything works perfectly. thanks ๐Ÿ™๐Ÿผ

Iโ€™m blown away. 0.21.2 really fixed it. Thanks a lot โ€ฆ youโ€™re lighning fast! Do you guys ever sleep? ๐Ÿ˜ƒ

Thanks for reporting the errors, @guerrapin, @stvogel, it was very helpful! ๐Ÿ™‚

I submitted a patch at https://github.com/iterative/dvc/pull/1378 and hope to release it today.

A work around for this could be using the --force option, just making sure your working directory is not dirty (with changes that could be overwritten or removed when the checkout happends; always be careful when using the --force).

Got the same error, if it helps here is the output from dvc pull -v

Debug: fetched: [(921540, '1543417823157699328', '97eaf6e4e0cc4a84934b8f1f8f331417', '1543430639580068864')]
Debug: Inode '921540', mtime '1543417823157699328', actual mtime '1543417823157699328'.
Debug: UPDATE state SET timestamp = "1543430639622790144" WHERE inode = 921540
Debug: File '/home/vsionai/projects/vsionai/tmp/data-science-example/.dvc/cache/97/eaf6e4e0cc4a84934b8f1f8f331417', md5 '97eaf6e4e0cc4a84934b8f1f8f331417', actual '97eaf6e4e0cc4a84934b8f1f8f331417'
Checking out 'data/processed' with cache 'f05c98a20a3d3e282bd36a3e9f41278f.dir'.
Linking directory 'data/processed'.
Debug: SELECT count from state_info WHERE rowid=1
Debug: fetched: [(32,)]
Debug: UPDATE state_info SET count = 32 WHERE rowid = 1
Error: Traceback (most recent call last):
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/command/data_sync.py", line 29, in do_run
    force=self.args.force)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/project.py", line 789, in pull
    self.checkout(target=target, with_deps=with_deps, force=force)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/project.py", line 500, in checkout
    stage.checkout(force=force)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/stage.py", line 483, in checkout
    out.checkout(force=force)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/output/local.py", line 69, in checkout
    force=force)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 329, in checkout
    if force or self._already_cached(p):
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 349, in _already_cached
    return not self.changed_cache(current_md5)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 112, in changed_cache
    if self.changed_cache_file(md5):
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 95, in changed_cache_file
    if self.state.changed(cache, md5=md5):
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 88, in changed
    actual = self.update(path)
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 323, in update
    return self._do_update(path)[0]
  File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 274, in _do_update
    if not os.path.exists(path):
  File "/usr/lib/python3.6/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Error: Failed to pull data from the cloud: stat: path should be string, bytes, os.PathLike or integer, not NoneType

With data.dvc containing:

deps:
- md5: 3dcb1133b71ee29e89a27c952c8831e3.dir
  path: data/raw
- md5: c46da892a15ff5c9425bd3fddfee1a14
  path: src/data/split_dataset.py
md5: ded22f652f899bd36c8822c9da6747f2
outs:
- cache: true
  md5: f05c98a20a3d3e282bd36a3e9f41278f.dir
  path: data/processed```