dvc: dvc get: S3 timeout error when trying to dowload files
Bug Report
Description
I have several files tracked with dvc in a S3 bucket. When I try to download these files with dvc get
, it throws one of the following errors at some point, resulting with only a couple of these files downloaded:
ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/<role_aws>
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
ERROR: unexpected error - Connect timeout on endpoint URL: "http://<ip_machine>/latest/meta-data/iam/security-credentials/"
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
However, this does not happen if:
- I download the data by cloning the git repo and running
dvc pull
on it. - I run the
dvc get
command with the parameter-j 1
. - I download the data by using
aws s3 cp
(without using dvc).
Reproduce
- Track a dataset with dvc and a S3 bucket
- Run
dvc get
to download the tracked dataset
Expected
All files are downloaded, not just a couple of them.
Environment information
$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.12 on Linux-5.13.0-1023-aws-x86_64-with-glibc2.34
Supports:
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (8 by maintainers)
Leaving this here since it might be useful for people searching for this error.
We had the exact same error message when we tried to reference files in a data-registry from a different repo using
This was due to the fact that when importing from the remote repository, DVC could not authenticate against the S3 storage, since we had our credentials in a .dvc/config.local file in the registry repository. This file is ignored and not pushed to the git remote. One solution is to use
--global
for adding remotes and secrets globally on the machine.See https://github.com/iterative/dvc/issues/4858 for more info about the
config.local
issue.@Madrueno Are you using regular aws s3 or some s3-compatible storage? Do you have enpoint configured?