dvc: dvc pull: fails on a cloned repository that contains dvc-imported files

Bug Report

dvc pull: fails on a cloned repository that contains dvc-imported files

Description

We have a data registry repo (datasets), which contains different datasets in folders eg. dataset1 and then another model repo, that dvc imports a folder, then if we clone this model repo and run dvc pull then it fails to pull the data from the remote. I have tried to replicate this error running in a docker container based on images: python:3.10 and ubuntu:22.04, and latest DVC version 3.13.2 and python: 3.10.12, but I was unable to and it seems to work. However, on my current machine running ubuntu 22.04 LTS, it fails:

$ uname -a
Linux mzurad 5.19.0-50-generic #50-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 10 18:24:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ python3 -V
Python 3.10.12
$ git --version
git version 2.34.1

Reproduce with or without external cache

# remove comments to test with external cache
rm -rf testing-dvc

mkdir testing-dvc
cd testing-dvc

mkdir storage
#mkdir cache

function init_dvc {
    git init .
    dvc init
    #dvc cache dir ../cache
    #dvc config cache.type symlink
    dvc remote add -d local ../storage
}

function create_datasets {
    mkdir datasets
    cd datasets
    init_dvc
    mkdir dataset1
    cd dataset1
    echo {} > gt.json
    mkdir images
    cd images
    echo 1 > img1.png
    echo 2 > img2.png
    cd ../..
    dvc add -vvv dataset1/*
    git add .
    git commit -m 'adding dataset1'
    dvc push -vvv
    cd ..
}

function create_model {
    mkdir model
    cd model
    init_dvc
    mkdir data
    dvc import -vvv -o data/raw --rev main ../datasets dataset1
    git add .
    git commit -m 'dvc import dataset1'
    dvc push -vvv
    dvc status -vvv
    dvc data status -vvv
    cd ..
}

create_datasets
create_model

git clone model model_cloned
cd model_cloned
#dvc cache dir --unset
dvc pull

dvc doctor
cd ..
tree

Expected

Expected output is that model_cloned correctly contains the same files as model, therefore the output of tree testing-dvc, should be:

.
├── datasets
│   └── dataset1
│       ├── gt.json
│       ├── gt.json.dvc
│       ├── images
│       │   ├── img1.png
│       │   └── img2.png
│       └── images.dvc
├── model
│   └── data
│       ├── raw
│       │   ├── gt.json
│       │   └── images
│       │       ├── img1.png
│       │       └── img2.png
│       └── raw.dvc
├── model_cloned
│   └── data
│       ├── raw
│       │   ├── gt.json
│       │   └── images
│       │       ├── img1.png
│       │       └── img2.png
│       └── raw.dvc
└── storage/files/md5
    ├── 26
    │   └── ab0db90d72e28ad0ba1e22ee510510
    ├── 8a
    │   └── 80554c91d9fca8acb82f023de02f11
    ├── b0
    │   └── 26324c6904b2a9cb4b88d6d61c81d1
    └── fb
        └── 1baa69ab7cc94156ef79a5a674bd8b.dir

but instead it errors out and here is the detailed output of dvc pull inside model_cloned

$ dvc pull -vvv 
2023-08-10 02:54:21,186 DEBUG: v3.13.2 (pip), CPython 3.10.12 on Linux-5.19.0-50-generic-x86_64-with-glibc2.35
2023-08-10 02:54:21,186 DEBUG: command: /home/mzurad/code/testing/.venv/bin/dvc pull -vvv
2023-08-10 02:54:21,186 TRACE: Namespace(quiet=0, verbose=3, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='pull', jobs=None, targets=[], remote=None, all_branches=False, all_tags=False, all_commits=False, force=False, with_deps=False, recursive=False, run_cache=False, glob=False, allow_missing=False, func=<class 'dvc.commands.data_sync.CmdDataPull'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2023-08-10 02:54:21,342 TRACE:   705.13 mks in collecting stages from /home/mzurad/code/testing/testing-dvc/model_cloned
2023-08-10 02:54:21,351 TRACE:     8.68 ms in collecting stages from /home/mzurad/code/testing/testing-dvc/model_cloned/data
2023-08-10 02:54:21,377 TRACE:     6.64 mks in collecting stages from /
2023-08-10 02:54:21,379 TRACE:     1.31 ms in collecting stages from /dataset1
2023-08-10 02:54:21,392 WARNING: No file hash info found for '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/.gitignore'. It won't be created.
2023-08-10 02:54:21,392 DEBUG: failed to create '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/.gitignore' from 'None'
Traceback (most recent call last):
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 90, in _create_files
    src_fs, src_path = storage_obj.get(entry)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 173, in get
    raise ValueError
ValueError

2023-08-10 02:54:21,393 WARNING: No file hash info found for '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/gt.json'. It won't be created.
2023-08-10 02:54:21,393 DEBUG: failed to create '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/gt.json' from 'None'
Traceback (most recent call last):
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 90, in _create_files
    src_fs, src_path = storage_obj.get(entry)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 173, in get
    raise ValueError
ValueError

2023-08-10 02:54:21,393 WARNING: No file hash info found for '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/images/img2.png'. It won't be created.
2023-08-10 02:54:21,393 DEBUG: failed to create '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/images/img2.png' from 'None'
Traceback (most recent call last):
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 90, in _create_files
    src_fs, src_path = storage_obj.get(entry)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 173, in get
    raise ValueError
ValueError

2023-08-10 02:54:21,393 WARNING: No file hash info found for '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/images/img1.png'. It won't be created.
2023-08-10 02:54:21,393 DEBUG: failed to create '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw/images/img1.png' from 'None'
Traceback (most recent call last):
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 90, in _create_files
    src_fs, src_path = storage_obj.get(entry)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc_data/index/index.py", line 173, in get
    raise ValueError
ValueError

2023-08-10 02:54:21,393 DEBUG: Removing '/home/mzurad/code/testing/testing-dvc/model_cloned/data/raw'
Everything is up to date.
2023-08-10 02:54:21,393 ERROR: failed to pull data from the cloud - Checkout failed for following targets:
data/raw
Is your cache up to date?
<https://error.dvc.org/missing-files>
Traceback (most recent call last):
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 31, in run
    stats = self.repo.pull(
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc/repo/pull.py", line 43, in pull
    stats = self.checkout(
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/mzurad/code/testing/.venv/lib/python3.10/site-packages/dvc/repo/checkout.py", line 208, in checkout
    raise CheckoutError([relpath(out_path) for out_path in failed], stats)
dvc.exceptions.CheckoutError: Checkout failed for following targets:
data/raw
Is your cache up to date?
<https://error.dvc.org/missing-files>

2023-08-10 02:54:21,396 DEBUG: Analytics is enabled.
2023-08-10 02:54:21,415 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpr1kfn_ul']'
2023-08-10 02:54:21,416 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpr1kfn_ul']'

Regression testing

I have run the same test for all version since 2.45 with the following script:

VERSIONS="2.45.0 2.45.1 2.46.0 2.47.0 2.47.1 2.47.2 2.48.0 2.49.0 2.50.0 2.51.0 2.52.0 2.53.0 2.54.0 2.55.0 2.56.0 2.57.0 2.57.1 2.57.2 2.57.3 2.58.0 2.58.1 2.58.2 3.0.0a0 3.0.0a1 3.0.0a2 3.0.0 3.1.0 3.2.0 3.2.1 3.2.2 3.2.3 3.2.4 3.3.0 3.3.1 3.4.0 3.5.0 3.5.1 3.6.0 3.7.0 3.8.0 3.8.1 3.9.0 3.9.1 3.10.0 3.10.1 3.11.0 3.11.1 3.12.0 3.13.0 3.13.1 3.13.2 3.13.3"

rm -rf logs
mkdir logs

TEST="test -e"
#test for symlinks when using external cache
#TEST="test -L"

for version in $VERSIONS;
do
    rm -rf .venv
    python3 -m venv .venv > logs/$version.log 2>&1
    source .venv/bin/activate
    pip install dvc==$version > logs/$version.log 2>&1
    bash reproduce.sh > logs/$version.log 2>&1
    cd testing-dvc/model_cloned/data
    ($TEST raw/gt.json && $TEST raw/images/img2.png && $TEST raw/images/img1.png) \
        && echo "${version} WORKS" \
        || echo "${version} FAILED"
    cd ../../..
done

which gives us the following versions that work and fail:

version without external cache with external cache
2.45.0 WORKS WORKS
WORKS WORKS
2.52.0 WORKS WORKS
2.53.0 WORKS FAILED
2.54.0 WORKS FAILED
2.55.0 WORKS FAILED
2.56.0 WORKS FAILED
2.57.0 FAILED WORKS
2.57.1 FAILED FAILED
FAILED FAILED
2.58.2 FAILED FAILED
3.0.0a0 FAILED WORKS
3.0.0a1 FAILED FAILED
FAILED FAILED
3.13.2 FAILED FAILED
3.13.3 FAILED FAILED

so it seems to work for some older 2.x versions:

Environment information

Output of dvc doctor on DVC version: 3.13.2:

$ dvc doctor
DVC version: 3.13.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.19.0-50-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 2.12.2
	dvc_objects = 0.25.0
	dvc_render = 0.5.3
	dvc_task = 0.3.0
	scmrepo = 1.1.0
Supports:
	http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3)
Config:
	Global: /home/mzurad/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-root
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/737e6c27473cb6374a1e6df03130e328

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Finally think I found a way to reproduce this (or something similar) internally. @iterative/dvc You need to configure access to the aws sandbox and then run this script:

set -x
rm -rf working not-working
echo "Pull succeeds when correct profile is set"
git clone https://github.com/dberenbaum/download-dvc-dir working
cd working
AWS_PROFILE=iterative-sandbox dvc pull
git clone https://github.com/dberenbaum/download-dvc-dir not-working
cd not-working
echo "Pull fails when invalid profile is set"
AWS_PROFILE=invalid dvc pull
echo "Pull still fails after correcting profile"
AWS_PROFILE=iterative-sandbox dvc pull

Here’s the output I see:

+ rm -rf working not-working
+ echo 'Pull succeeds when correct profile is set'
Pull succeeds when correct profile is set
+ git clone https://github.com/dberenbaum/download-dvc-dir working
Cloning into 'working'...
remote: Enumerating objects: 210, done.
remote: Counting objects: 100% (210/210), done.
remote: Compressing objects: 100% (205/205), done.
remote: Total 210 (delta 4), reused 208 (delta 2), pack-reused 0
Receiving objects: 100% (210/210), 18.04 MiB | 5.31 MiB/s, done.
Resolving deltas: 100% (4/4), done.
+ cd working
+ AWS_PROFILE=iterative-sandbox
+ dvc pull
Collecting                                                      |189 [00:07, 24.1entry/s]
Fetching
Building workspace index                                       |187 [00:00, 1.87kentry/s]
Comparing indexes                                              |190 [00:00, 24.9kentry/s]
Applying changes                                               |2.00 [00:00,   486file/s]
A       model.pkl
A       lstm/
2 files added and 186 files fetched
+ git clone https://github.com/dberenbaum/download-dvc-dir not-working
Cloning into 'not-working'...
remote: Enumerating objects: 210, done.
remote: Counting objects: 100% (210/210), done.
remote: Compressing objects: 100% (205/205), done.
remote: Total 210 (delta 4), reused 208 (delta 2), pack-reused 0
Receiving objects: 100% (210/210), 18.04 MiB | 3.17 MiB/s, done.
Resolving deltas: 100% (4/4), done.
+ cd not-working
+ echo 'Pull fails when invalid profile is set'
Pull fails when invalid profile is set
+ AWS_PROFILE=invalid
+ dvc pull
Collecting                                                     |3.00 [00:06, 2.06s/entry]
ERROR: failed to connect to s3 (dave-sandbox/cache) - The config profile (invalid) could not be found
ERROR: failed to connect to s3 (dave-sandbox/cache/files/md5) - The config profile (invalid) could not be found
Fetching
ERROR: failed to pull data from the cloud - 2 files failed to download
+ echo 'Pull still fails after correcting profile'
Pull still fails after correcting profile
+ AWS_PROFILE=iterative-sandbox
+ dvc pull
Collecting                                                     |0.00 [00:06,    ?entry/s]
Fetching
Building workspace index                                       |187 [00:00, 1.60kentry/s]
Comparing indexes                                               |190 [00:03, 59.7entry/s]
WARNING: No file hash info found for '/private/tmp/working/not-working/lstm/epoch=0-step=2.ckpt'. It won't be created.
Applying changes                                               |1.00 [00:00,   127file/s]
A       model.pkl
1 file added and 2 files fetched
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
lstm
Is your cache up to date?
<https://error.dvc.org/missing-files>

Works for both external and internal caches, also on the newly released 3.14. Thanks a lot !

Can I remove all the directories inside /var/tmp/dvc/repo ? What are they for ? If they can introduce hard to debug bugs like this one, should they be perhaps nuked when you install a new version, although I am not sure how that could help. Or at least there should be docs somewhere to suggest removing this directory.

On another note, I’ve been using DVC for a few months now, and my initial impression is that new releases are happening perhaps too frequently and often breaking core functionality. Don’t get me wrong, but shouldn’t there be more focus on stability ? I already received some feedback from colleagues that get quite hesitant about using DVC at all, when they see these types of bugs.

@dberenbaum As promised, I’ll look into dropping fetch cache in drop_data_index and will close it.

EDIT: doesn’t seem like we need to for now. Closing.

Ok, so there were a few things here, but the main one is really https://github.com/iterative/dvc-data/pull/447/commits/99fd441e2f3fa69939a25d00eb5714e640898fe1 , where we were thinking that directory entries were supposed to be loaded later, but they were not (e.g. because of missing objects or missing creds or auth error or something else), and that was then cached in site_cache_dir and was not affected by drop_data_index because it is not under tree prefix (which it probably should drop, taking a look).

After https://github.com/iterative/dvc-data/pull/447 , those repro scripts from above start to work for me just fine. Will take a final look in the morning.

@dberenbaum Thanks again for the repro script, that helps a lot! 🙏

Here is a simplified local version that shows the same problem and narrows it down:

#!/bin/bash
set -e
set -x

rm -rf mytest
mkdir mytest
cd mytest

mkdir mydatareg
pushd mydatareg
MYDATAREG=$(pwd)
git init
dvc init
git add .
git commit -m "init"
mkdir data
echo foo > data/foo
dvc add data
git add data.dvc .gitignore
git commit -m "data"
popd

mkdir myrepo
cd myrepo
dvc init --no-scm
dvc import $MYDATAREG data

rm -rf .dvc/cache
rm -rf /Library/Caches/dvc
rm -rf data

mv $MYDATAREG/.dvc/cache $MYDATAREG/.dvc/cache.back

dvc pull || echo FAILED

mv $MYDATAREG/.dvc/cache.back $MYDATAREG/.dvc/cache

dvc pull || echo STILL FAILED

rm -rf /Library/Caches/dvc  # obviously this is only relevant for macs

dvc pull

So it is index for import that is not reloaded for the second pull after the first one failed. Looking into it…

Nice, thank you for the repro script! It looks like a similar problem to one that was recently raised here. The whole thread is long and not all related, but that comment looks similar to what you raise.

@efiop WDYT? Below I slightly adjust the script to do dvc import --no-download followed by dvc pull, which also unexpectedly fails.

FROM ubuntu:20.04
# Deps
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
        sudo git python3.8-dev python3-pip \
    && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir dvc==3.19.0
# Create a non-root user (perhaps not required but is closer to my setup)
ARG user=user
ARG uid=1000
ARG gid=${uid}
RUN groupadd --gid ${gid} ${user} \
    && useradd -s /bin/bash --uid ${uid} --gid ${gid} -m ${user} \
    && echo ${user} ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/${user} \
    && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers \
    && chmod 0440 /etc/sudoers.d/${user}
USER ${user}
# Configure git
RUN git config --global user.name ${user} \
    && git config --global user.email ${user}@example.com
# DVC data reg git repo with a directory as remote storage
RUN mkdir /home/${user}/storage && mkdir /home/${user}/data_reg
WORKDIR /home/${user}/data_reg
RUN git init && dvc init
RUN dvc config core.autostage true
RUN dvc remote add -d storage /home/${user}/storage
RUN mkdir -p data/raw && echo 1 > data/raw/data.txt
RUN dvc add data/raw && dvc push
RUN git status && git commit -m "add test data"
# clean all local data
# RUN rm -rf /var/tmp/dvc \
#     && rm -rf .dvc/cache \
#     && rm -rf .dvc/tmp \
#     && rm -rf data/raw
# DVC model training repo, imports data from data_reg git repo
RUN mkdir -p /home/${user}/model_repo/data
WORKDIR /home/${user}/model_repo
RUN dvc init --no-scm
# This part runs without issue
RUN dvc import --no-download ../data_reg data/raw -o data/raw
RUN cat data/raw.dvc
# This is where it fails...
RUN dvc pull data/raw.dvc --verbose

I tried my hand at a minimum reproducible error with a Dockerfile. It does produce an error but it is not the same as the original post. Let me know if I am missing anything or should move to a new issue:

FROM ubuntu:20.04
# Deps
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
        sudo git python3.8-dev python3-pip \
    && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir dvc==3.19.0
# Create a non-root user (perhaps not required but is closer to my setup)
ARG user=user
ARG uid=1000
ARG gid=${uid}
RUN groupadd --gid ${gid} ${user} \
    && useradd -s /bin/bash --uid ${uid} --gid ${gid} -m ${user} \
    && echo ${user} ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/${user} \
    && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers \
    && chmod 0440 /etc/sudoers.d/${user}
USER ${user}
# Configure git
RUN git config --global user.name ${user} \
    && git config --global user.email ${user}@example.com
# DVC data reg git repo with a directory as remote storage
RUN mkdir /home/${user}/storage && mkdir /home/${user}/data_reg
WORKDIR /home/${user}/data_reg
RUN git init && dvc init
RUN dvc config core.autostage true
RUN dvc remote add -d storage /home/${user}/storage
RUN mkdir -p data/raw && echo 1 > data/raw/data.txt
RUN dvc add data/raw && dvc push
RUN git status && git commit -m "add test data"
# clean all local data
RUN rm -rf /var/tmp/dvc \
    && rm -rf .dvc/cache \
    && rm -rf .dvc/tmp \
    && rm -rf data/raw
# DVC model training repo, imports data from data_reg git repo
RUN mkdir -p /home/${user}/model_repo/data
WORKDIR /home/${user}/model_repo
RUN dvc init --no-scm
# This part downloads the content without issue
RUN dvc import ../data_reg data/raw -o data/raw
RUN cat data/raw.dvc
RUN ls -l data/raw
# Manually remove cache to illustrate.
# Same result as if we ran `dvc import --no-download`
RUN rm -rf /var/tmp/dvc \
    && rm -rf .dvc/cache \
    && rm -rf data/raw
# This is where it fails...
RUN dvc pull data/raw.dvc --verbose

Running docker build --progress=plain . gives me the following output:

...
#25 0.990 2023-09-11 19:18:47,784 DEBUG: failed to load ('data', 'raw') from storage local (data/raw) - failed to load directory ('data', 'raw'): [Errno 2] No such file or directory: '/home/user/model_repo/.dvc/cache/files/md5/54/3ef358e20c4e74b80d2c71243210f0.dir'
#25 0.990 Traceback (most recent call last):
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 541, in _load_from_storage
#25 0.990     _load_from_object_storage(trie, entry, storage)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 477, in _load_from_object_storage
#25 0.990     obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/hashfile/tree.py", line 193, in load
#25 0.990     with obj.fs.open(obj.path, "r") as fobj:
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_objects/fs/base.py", line 228, in open
#25 0.990     return self.fs.open(path, mode=mode, **kwargs)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_objects/fs/local.py", line 136, in open
#25 0.990     return open(path, mode=mode, encoding=encoding)
#25 0.990 FileNotFoundError: [Errno 2] No such file or directory: '/home/user/model_repo/.dvc/cache/files/md5/54/3ef358e20c4e74b80d2c71243210f0.dir'
#25 0.990 
#25 0.990 The above exception was the direct cause of the following exception:
#25 0.990 
#25 0.990 Traceback (most recent call last):
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 543, in _load_from_storage
#25 0.990     _load_from_file_storage(trie, entry, storage)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 515, in _load_from_file_storage
#25 0.990     for entry in build_entries(path, fs):
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/build.py", line 54, in build_entries
#25 0.990     for root, dirs, files in walk_iter:
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/fsspec/spec.py", line 418, in walk
#25 0.990     listing = self.ls(path, detail=True, **kwargs)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc/fs/dvc.py", line 299, in ls
#25 0.990     for info in dvc_fs.ls(dvc_path, detail=True):
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_objects/fs/base.py", line 425, in ls
#25 0.990     return self.fs.ls(path, detail=detail, **kwargs)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/fs.py", line 134, in ls
#25 0.990     for key, info in self.index.ls(root_key, detail=True):
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 693, in ls
#25 0.990     self._ensure_loaded(root_key)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 690, in _ensure_loaded
#25 0.990     self._load(prefix, entry)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 636, in _load
#25 0.990     self.onerror(entry, exc)
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 568, in _onerror
#25 0.990     raise exc
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 634, in _load
#25 0.990     _load_from_storage(self._trie, entry, self.storage_map[key])
#25 0.990   File "/usr/local/lib/python3.8/dist-packages/dvc_data/index/index.py", line 556, in _load_from_storage
#25 0.990     raise DataIndexDirError(f"failed to load directory {entry.key}") from last_exc
#25 0.990 dvc_data.index.index.DataIndexDirError: failed to load directory ('data', 'raw')
#25 0.990 
#25 0.991 2023-09-11 19:18:47,785 DEBUG: failed to create '/home/user/model_repo/data/raw' from 'None'
#25 0.991 NoneType: None
#25 0.991 
#25 0.991 Everything is up to date.
#25 0.991 2023-09-11 19:18:47,785 ERROR: failed to pull data from the cloud - Checkout failed for following targets:
#25 0.991 data/raw
#25 0.991 Is your cache up to date?
#25 0.991 <https://error.dvc.org/missing-files>
#25 0.991 Traceback (most recent call last):
#25 0.991   File "/usr/local/lib/python3.8/dist-packages/dvc/commands/data_sync.py", line 31, in run
#25 0.991     stats = self.repo.pull(
#25 0.991   File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 61, in wrapper
#25 0.991     return f(repo, *args, **kwargs)
#25 0.991   File "/usr/local/lib/python3.8/dist-packages/dvc/repo/pull.py", line 43, in pull
#25 0.991     stats = self.checkout(
#25 0.991   File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 61, in wrapper
#25 0.991     return f(repo, *args, **kwargs)
#25 0.991   File "/usr/local/lib/python3.8/dist-packages/dvc/repo/checkout.py", line 204, in checkout
#25 0.991     raise CheckoutError([relpath(out_path) for out_path in failed], stats)
#25 0.991 dvc.exceptions.CheckoutError: Checkout failed for following targets:
#25 0.991 data/raw
#25 0.991 Is your cache up to date?
#25 0.991 <https://error.dvc.org/missing-files>
...