dvc: dvc run : possible bug with deeply nested dependencies
Please provide information about your setup
ubuntu 18.04, dvc==0.59.2
, pip
install with miniconda python 3.7
For deeply nested dependencies, it looks like dvc is not tracking them properly in the .dvc files
Following script reproduces the issue:
#!/bin/bash
set -x
set -e
rm -rf dvc_test
mkdir dvc_test && cd dvc_test
mkdir scripts
mkdir -p data/recommended/dataset1/dataset1_proc
echo bar > data/recommended/dataset1/v1.txt
git init
dvc init
echo -e "import sys\
\nwith open(sys.argv[1], 'w') as f: f.write(sys.argv[2])" > scripts/script.py
# Run works
dvc run -y \
-w $PWD/data/recommended/dataset1/dataset1_proc\
-f ./data/recommended/dataset1/dataset1_proc/v1.dvc\
-d ../../../../scripts/script.py \
-o v1 \
"mkdir v1 && python ../../../../scripts/script.py v1/v1.txt proc_data"
# Inspecting v1.dvc shows that the script.py dependency is missing on ../
cat data/recommended/dataset1/dataset1_proc/v1.dvc
# Because of the, repro does not work
dvc repro data/recommended/dataset1/dataset1_proc/v1.dvc
Inspecting the v1.dvc
file shows:
md5: 4ae72e168a0a6a2f1aaadfb5628640f7
cmd: mkdir v1 && python ../../../../scripts/script.py v1/v1.txt proc_data
deps:
- md5: 791b9c74b1d9308a3226b93a36689dad
path: ../../../scripts/script.py
outs:
- md5: 188ed6cb603658d01ef7ba8fb7c434fe.dir
path: v1
cache: true
metric: false
persist: false
+ dvc repro data/recommended/dataset1/dataset1_proc/v1.dvc
Which indicates that the script.py
dependency is indeed missing one ../
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 29 (18 by maintainers)
@tdeboissiere Ok, I am able to reproduce this on docker with:
Investigating. Thank you for your patience 🙂
EDIT: Interesting detail is that dvcfile has even less …/ now:
EDIT2: with --no-scm everything stays the same, so it is unlikely Gitpython’s fault.
EDIT3: confirmed pretty old regression, investigating closer…
@tdeboissiere my understanding is that when you run a command with
dvc run
/dvc repro
it should keep your environment variables unchanged (obvious example $PATH that is used to findpython
and other binaries). It’s not what I see in some cases (that link on Discord). I don’t have an explanation yet - is it DVC, some zsh settings, some specific machine settings - we don’t know yet. But my thinking was - can it be the case here as well? some changes to the environment when you run commands with DVC.@efiop My pleasure, it’s always a treat to get my problems solved here !
@tdeboissiere Merged a fix for this into master, will release a new dvc version with it ASAP. In the meanwhile, you could try installing from master to check if that works for you too. I.e.
Thank you so much for reporting this issue and helping us investigate it! We really appreciate that 🙂
@tdeboissiere Ok, the patch is taking a bit longer, because the bug is quite deep and the proper solution breaks other parts of the code temporarily. Basically, the issue is
os.relpath
that we are using inPathInfo.__str__
, which in turn gets used inPathInfo.as_posix()
when we are dumping the dvc file afterdvc run
. So depending on where you are located, it might resolve relative path differently. E.g. if you are in/home/user
and runos.path.relpath("../path")
you’ll get../path
, but if you are in/
then you’ll getpath
. That is where your../
went missing. The difference between my and your machines is that you were running from/home/user/subdir
and I was running from/home/user/git/dvc/subdir
. So a workaround would be to simply move your root directory a few levels deeper.ETA for a fixed release is tomorrow.
Ran the following script on ubuntu 18.04 laptopt in
/home/user/debug
withbash debug.sh
The only line which is different between
env_before.txt
andenv_after.txt
is@efiop Nope, still got the same error, on multiple machines