dvc: dvc.api.open: HTTPError
Bug Report
dvc.api.open
fails with an HTTPError
while dvc pull
works just fine.
Setup
$ dvc version
DVC version: 2.0.2 (pip)
---------------------------------
Platform: Python 3.6.10 on Darwin-20.3.0-x86_64-i386-64bit
Supports: http, https, s3, ssh
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git
Steps to reproduce
Considering the situation already exposed here, the following code
import logging
from dvc.api import open
logger = logging.getLogger('dvc')
logger.setLevel(logging.DEBUG)
with open('foo', mode='rb') as file:
bar = file.read()
fails (without any log) with the following uncaught exception:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-X-XXXXXXXXXXXX> in <module>
----> 1 with open('foo', mode='rb') as file:
2 bar = file.read()
3
.../python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
.../python3.6/site-packages/dvc/api.py in _open(path, repo, rev, remote, mode, encoding)
76 with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
77 with _repo.open_by_relpath(
---> 78 path, remote=remote, mode=mode, encoding=encoding
79 ) as fd:
80 yield fd
.../python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
.../python3.6/site-packages/dvc/repo/__init__.py in open_by_relpath(self, path, remote, mode, encoding)
479 try:
480 with fs.open(
--> 481 path, mode=mode, encoding=encoding, remote=remote,
482 ) as fobj:
483 yield fobj
.../python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
.../python3.6/site-packages/dvc/utils/http.py in open_url(url, mode, encoding, **iter_opts)
14 assert mode in {"r", "rt", "rb"}
15
---> 16 with iter_url(url, **iter_opts) as (response, it):
17 bytes_stream = IterStream(it)
18
.../python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
.../python3.6/site-packages/dvc/utils/http.py in iter_url(url, chunk_size)
57 response.close()
58
---> 59 response = request()
60 it = gen(response)
61 try:
.../python3.6/site-packages/dvc/utils/http.py in request(headers)
34 if response.status_code == 404:
35 raise FileNotFoundError(f"Can't open {the_url}")
---> 36 response.raise_for_status()
37 return response
38
.../python3.6/site-packages/requests/models.py in raise_for_status(self)
941
942 if http_error_msg:
--> 943 raise HTTPError(http_error_msg, response=self)
944
945 def close(self):
HTTPError: 400 Client Error: Bad Request for url: https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXXXX%2FXXXXXXXX%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210305T184008Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
(note the incoherent eu-west-1
and us-east-1
localisations present in the url). In the meantime, dvc pull foo
perfectly works. Note that it suddenly stopped to work (with a previous dvc
version before upgrading) with the following exception:
HTTPError: 403 Client Error: Forbidden for url: https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Signature=XXXXXXXXXXXXXXXXXXXX%2FXXXXXX%3D&Expires=1614884948
Any idea what’s going on?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (8 by maintainers)
Hi @isidentical, sorry for the late answer. As of
dvc==2.5.4
, explicitly specifyingregion = eu-west-1
in.dvc/config
is no longer required in order to avoid the failure; so the issue can indeed be considered closed.Hey @hugo-ricateau-tiime can you try to installl dvc from the master and give it a shot?
pip install "dvc[s3] @ git+https://github.com/iterative/dvc"
Yes, that is what makes me dubitative: it was perfectly working with
dvc==1.11.16
until it suddenly stopped to work (last thursday; with theHTTPError: 403 Client Error: Forbidden for url: ...
exception); without any change in the environment concerningdvc
or one of its requirements. Then, I upgradeddvc
(as well as its requirements of course) in order to report with the latest version (which resulted in theHTTPError: 400 Client Error: Bad Request for url: ...
exception). Note that nothing changed withins3
/ users configurations in the meantime.I realise that my initial report was not that clear concerning this point; sorry about that.
Possibly next week when we migrate to s3fs @hugo-ricateau-tiime it will be fully resolved. For now I am not exactly sure that this is because of that hardcoded constant though it seems like the most reasonable explanation considering nothing else changed on the signature generation that I am aware of. I’ll create a PR for addressing that signature issue ASAP (as I said, might not be relevant).
@hugo-ricateau-tiime can you also open that link (the one that you masked as
https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Signature=XXXXXXXXXXXXXXXXXXXX%2FXXXXXX%3D&Expires=1614884948
) on your browser and send me the full XML (or crop if anything is private) so that I can see the error message.