dvc: dvc.api.open: HTTPError

Bug Report

dvc.api.open fails with an HTTPError while dvc pull works just fine.

Setup

$ dvc version
DVC version: 2.0.2 (pip)
---------------------------------
Platform: Python 3.6.10 on Darwin-20.3.0-x86_64-i386-64bit
Supports: http, https, s3, ssh
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git

Steps to reproduce

Considering the situation already exposed here, the following code

import logging
from dvc.api import open

logger = logging.getLogger('dvc')
logger.setLevel(logging.DEBUG)

with open('foo', mode='rb') as file:
    bar = file.read()

fails (without any log) with the following uncaught exception:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-X-XXXXXXXXXXXX> in <module>
----> 1 with open('foo', mode='rb') as file:
      2     bar = file.read()
      3

.../python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

.../python3.6/site-packages/dvc/api.py in _open(path, repo, rev, remote, mode, encoding)
     76     with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
     77         with _repo.open_by_relpath(
---> 78             path, remote=remote, mode=mode, encoding=encoding
     79         ) as fd:
     80             yield fd

.../python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

.../python3.6/site-packages/dvc/repo/__init__.py in open_by_relpath(self, path, remote, mode, encoding)
    479         try:
    480             with fs.open(
--> 481                 path, mode=mode, encoding=encoding, remote=remote,
    482             ) as fobj:
    483                 yield fobj

.../python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

.../python3.6/site-packages/dvc/utils/http.py in open_url(url, mode, encoding, **iter_opts)
     14     assert mode in {"r", "rt", "rb"}
     15
---> 16     with iter_url(url, **iter_opts) as (response, it):
     17         bytes_stream = IterStream(it)
     18

.../python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

.../python3.6/site-packages/dvc/utils/http.py in iter_url(url, chunk_size)
     57             response.close()
     58
---> 59     response = request()
     60     it = gen(response)
     61     try:

.../python3.6/site-packages/dvc/utils/http.py in request(headers)
     34         if response.status_code == 404:
     35             raise FileNotFoundError(f"Can't open {the_url}")
---> 36         response.raise_for_status()
     37         return response
     38

.../python3.6/site-packages/requests/models.py in raise_for_status(self)
    941
    942         if http_error_msg:
--> 943             raise HTTPError(http_error_msg, response=self)
    944
    945     def close(self):

HTTPError: 400 Client Error: Bad Request for url: https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXXXX%2FXXXXXXXX%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210305T184008Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

(note the incoherent eu-west-1 and us-east-1 localisations present in the url). In the meantime, dvc pull foo perfectly works. Note that it suddenly stopped to work (with a previous dvc version before upgrading) with the following exception:

HTTPError: 403 Client Error: Forbidden for url: https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Signature=XXXXXXXXXXXXXXXXXXXX%2FXXXXXX%3D&Expires=1614884948

Any idea what’s going on?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Hi @isidentical, sorry for the late answer. As of dvc==2.5.4, explicitly specifying region = eu-west-1 in .dvc/config is no longer required in order to avoid the failure; so the issue can indeed be considered closed.

Hey @hugo-ricateau-tiime can you try to installl dvc from the master and give it a shot? pip install "dvc[s3] @ git+https://github.com/iterative/dvc"

What do you mean by Sure, seems to be a signature mismatch [this is with dvc==1.11.16]:? Does dvc==1.11.16 is also problematic?

Yes, that is what makes me dubitative: it was perfectly working with dvc==1.11.16 until it suddenly stopped to work (last thursday; with the HTTPError: 403 Client Error: Forbidden for url: ... exception); without any change in the environment concerning dvc or one of its requirements. Then, I upgraded dvc (as well as its requirements of course) in order to report with the latest version (which resulted in the HTTPError: 400 Client Error: Bad Request for url: ... exception). Note that nothing changed within s3 / users configurations in the meantime.

I realise that my initial report was not that clear concerning this point; sorry about that.

Possibly next week when we migrate to s3fs @hugo-ricateau-tiime it will be fully resolved. For now I am not exactly sure that this is because of that hardcoded constant though it seems like the most reasonable explanation considering nothing else changed on the signature generation that I am aware of. I’ll create a PR for addressing that signature issue ASAP (as I said, might not be relevant).

@hugo-ricateau-tiime can you also open that link (the one that you masked as https://bucket-name.s3-eu-west-1.amazonaws.com/bucket-name/XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Signature=XXXXXXXXXXXXXXXXXXXX%2FXXXXXX%3D&Expires=1614884948) on your browser and send me the full XML (or crop if anything is private) so that I can see the error message.