dvc: ERROR: failed to pull data from the cloud - Checkout failed for following targets:
Bug Report
Description
On Github Actions, I’m receiving the following error message during a dvc pull
(the full log is here):
2023-06-22T15:47:53.8024770Z dvc.exceptions.CheckoutError: Checkout failed for following targets:
2023-06-22T15:47:53.8026096Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/deblurring/64/group1-shard1of6.bin
2023-06-22T15:47:53.8027090Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/deblurring/64/group1-shard5of6.bin
2023-06-22T15:47:53.8028055Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/deblurring/64/group1-shard2of6.bin
2023-06-22T15:47:53.8029031Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/enhancement/64/group1-shard1of4.bin
2023-06-22T15:47:53.8030027Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/enhancement/256/group1-shard1of4.bin
2023-06-22T15:47:53.8030981Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/64/group1-shard2of6.bin
2023-06-22T15:47:53.8031924Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/64/group1-shard4of6.bin
2023-06-22T15:47:53.8032902Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/64/group1-shard5of6.bin
2023-06-22T15:47:53.8033864Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/64/group1-shard1of6.bin
2023-06-22T15:47:53.8034807Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/256/group1-shard6of6.bin
2023-06-22T15:47:53.8047775Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/64/group1-shard6of6.bin
2023-06-22T15:47:53.8049024Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/256/group1-shard5of6.bin
2023-06-22T15:47:53.8050010Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/256/group1-shard2of6.bin
2023-06-22T15:47:53.8050973Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/256/group1-shard4of6.bin
2023-06-22T15:47:53.8051949Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/denoising/256/group1-shard1of6.bin
2023-06-22T15:47:53.8052920Z /home/runner/work/UpscalerJS/UpscalerJS/models/maxim-experiments/models/deraining/64/group1-shard3of4.bin
2023-06-22T15:47:53.8053476Z Is your cache up to date?
2023-06-22T15:47:53.8054014Z <https://error.dvc.org/missing-files>
It appears that a subset of files fails to be pulled.
Things I’ve tried:
- I’ve made a fresh clone of the repository and ran
dvc pull
there without any issues. All models get pulled successfully. - I’d set up a cache folder for dvc in the Github repo. I’ve temporarily removed this cache and am still observing the issue.
Reproduce
I’m not quite sure how to reproduce, as this is only happening on Github Actions. Here is a sample run where it happens. I cannot reproduce locally.
Environment information
Output of dvc doctor
:
2023-06-22T15:41:00.9579273Z ##[group]Run dvc doctor
2023-06-22T15:41:00.9579679Z [36;1mdvc doctor[0m
2023-06-22T15:41:00.9638865Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail ***0***
2023-06-22T15:41:00.9639320Z env:
2023-06-22T15:41:00.9639648Z TF_CPP_MIN_LOG_LEVEL: 2
2023-06-22T15:41:00.9640091Z BROWSERSTACK_USERNAME: ***-GitHubAction
2023-06-22T15:41:00.9640545Z BROWSERSTACK_ACCESS_KEY: ***
2023-06-22T15:41:00.9640964Z BROWSERSTACK_PROJECT_NAME: UpscalerJS
2023-06-22T15:41:00.9641722Z BROWSERSTACK_BUILD_NAME: [creating-maxim-models] Commit 9970243: Add new Maxim models:
- Deblurring
- Denoising
- Deraining
- Dehazing
- Indoor
- Outdoor
- Low Light Enhancement
- Retouching [Workflow: 3511]
2023-06-22T15:41:00.9642397Z ##[endgroup]
2023-06-22T15:41:01.5753054Z DVC version: 3.1.0 (***args.pkg***)
2023-06-22T15:41:01.5753975Z -------------------------------
2023-06-22T15:41:01.5755086Z Platform: Python 3.10.8 on Linux-5.15.0-1040-azure-x86_64-with-glibc2.35
2023-06-22T15:41:01.5755647Z Subprojects:
2023-06-22T15:41:01.5756062Z
2023-06-22T15:41:01.5763663Z Supports:
2023-06-22T15:41:01.5764845Z azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
2023-06-22T15:41:01.5765960Z gdrive (pydrive2 = 1.15.4),
2023-06-22T15:41:01.5766492Z gs (gcsfs = 2023.6.0),
2023-06-22T15:41:01.5767538Z hdfs (fsspec = 2023.6.0, pyarrow = 12.0.1),
2023-06-22T15:41:01.5768272Z http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
2023-06-22T15:41:01.5769371Z https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
2023-06-22T15:41:01.5769944Z oss (ossfs = 2021.8.0),
2023-06-22T15:41:01.5770881Z s3 (s3fs = 2023.6.0, boto3 = 1.26.76),
2023-06-22T15:41:01.5771417Z ssh (sshfs = 2023.4.1),
2023-06-22T15:41:01.5772460Z webdav (webdav4 = 0.9.8),
2023-06-22T15:41:01.5773008Z webdavs (webdav4 = 0.9.8),
2023-06-22T15:41:01.5773880Z webhdfs (fsspec = 2023.6.0)
2023-06-22T15:41:01.5774406Z Config:
2023-06-22T15:41:01.5775302Z Global: /home/runner/.config/dvc
2023-06-22T15:41:01.5775835Z System: /etc/xdg/dvc
2023-06-22T15:41:01.5777114Z Cache types: <https://error.dvc.org/no-dvc-cache>
2023-06-22T15:41:01.5777704Z Caches: local
2023-06-22T15:41:01.5778601Z Remotes: gdrive, s3, gdrive
2023-06-22T15:41:01.5779216Z Workspace directory: ext4 on /dev/sdb1
2023-06-22T15:41:01.5780114Z Repo: dvc, git
2023-06-22T15:41:01.5780824Z Repo.site_cache_dir: /var/tmp/dvc/repo/024b6be4ceeb4a42ecb120da7fbc6489
I’d be happy to provide any other information for help in debugging this. I’m not sure of the best way to troubleshoot as the issue only seems to appear in Github Actions.
UPDATE
One additional thing that might be useful is, there are three remotes associated with this repo:
gdrive
(A regular Google Drive)gdrive-service-account
(The same account as above, but set up to work with a service account)s3
(Amazon S3, mirrored)
The reason for the two gdrive remotes is so that users can clone the repo and easily pull models (it’s an open source library and gdrive
is the default remote), but also to enable CI integration (which afaik requires a service account).
That said, I’ve confirmed locally that pulling from the service account works successfully. Just not in the Github Actions session for some reason.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 27 (12 by maintainers)
Closing this as duplicate of https://github.com/iterative/dvc/issues/9651
This should be resolved in the latest DVC release (3.14.0 or later). In github actions CI you should not need to do anything other than updating DVC.
On non-CI machines (for @gtebbutt and @WilliamHarvey97) you will need to remove the
Repo.site_cache_dir
folder (fromdvc doctor
) after updating DVC and then retry your pullIf you still see this issue after updating and clearing the site cache feel free to re-open this ticket.
3.15.0
works like a charm! No errors.I’m seeing the same issue, persisting on 3.6.0 (at least on a machine that I just upgraded to that version after seeing it come up on 3.5.1), and have intermittently seen it back when all our machines were on v2 as well.
We use an S3 remote, but have a sync task set up to duplicate that entire bucket across to a read only bucket on GCP (to avoid repeatedly paying egress costs when pulling data to servers on GCP).
Root cause seems to be pulling from the secondary remote before the sync task has had a chance to run. On the first attempt
dvc pull
shows:But on all subsequent attempts (even after the buckets have replicated successfully) it fails with:
Running on a freshly cloned copy of the repo seems to work fine, so I’m assuming something in the state of the original copy of the repo is getting messed up by trying to pull when the files are missing from the cloud, and then it’s not successfully re-checking to see if they’ve appeared on subsequent runs?
I had this issue. On my side there was a problem with a cache folder. I deleted
rm -r /Library/Caches/dvc/
(I’m on macos) otherwise try deletingrm -r /var/caches/dvc
on linux.I suspect this occured to me while switching between branches and dvc v2 and v3.
Thanks for the response, @efiop.
Maybe the two gdrive remotes (one service account, one not)? I’m not sure if that’s standard practice for open source projects that need to enable CI.
Nothing else seems strange that I can think of.
(I’m not sure what here is sensitive, so I’ll be overly conservative)
I don’t believe so. It’s using whatever container Github Actions provides, which I believe is not a shared machine (or it may be shared but I assume things run in containers). Is there a way to determine whether the dvc cache would be shared more broadly?
I was able to SSH into the machine, and I manually deleted
.dvc/cache
(which I think is the cache folder?) along with.dvc/tmp
and reran the command. I got the same error - looks like the same missing files, so at least it’s consistent.