dvc: dvc pull: unexpected error - [Errno 22] Bad Request

Bug Report

dvc pull: unexpected error

Description

I have several dvc resources imported into a project… The tracking files (.dvc) are committed to the repository utilizing these resources. When attempting to pull (dvc pull), the associated tracked resources - am getting an error:

An error occurred (400) when calling the HeadObject operation: Bad Request (relevant log from dvc pull —v below).

Reproduce

Example:

  1. dvc import resource into project
  2. latter or from a fresh checkout of the above git repo and then attempt to dvc pull

Expected

Expecting tracked resource to be retrieved from DVC. I am able to perform a dvc update <resource> and that will pull the dvc resource into the folder structure, of course this causes the .dvc file to show as modified (even though it points to the same location/revision).

How to proceed here? something corrupted?

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.8.3 (pip)
---------------------------------
Platform: Python 3.8.0 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.27
Supports:
	webhdfs (fsspec = 2021.11.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2021.11.0, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Caches: local
Remotes: None
Workspace directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Repo: dvc, git
$ dvc pull -vvv

CUT......

2021-11-23 09:03:02,390 TRACE: Assuming '/projects/shared_dvc_cache/d6/9a712149e586998fb73c9566bd7e9f' is unchanged since it is read-only                                                                              
2021-11-23 09:03:02,394 DEBUG: Preparing to transfer data from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA' to '../../../../../../../shared_dvc_cache'                                                                                       
2021-11-23 09:03:02,394 DEBUG: Preparing to collect status from '../../../../../../../shared_dvc_cache'
2021-11-23 09:03:02,394 DEBUG: Collecting status from '../../../../../../../shared_dvc_cache'
2021-11-23 09:03:02,395 DEBUG: Preparing to collect status from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA'                                                                                                                                 
2021-11-23 09:03:02,396 DEBUG: Collecting status from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA'
2021-11-23 09:03:02,396 DEBUG: Querying 1 hashes via object_exists
2021-11-23 09:03:02,510 ERROR: unexpected error - [Errno 22] Bad Request: An error occurred (400) when calling the HeadObject operation: Bad Request                                                                                             
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 250, in _call_s3
    out = await method(**additional_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/aiobotocore/client.py", line 155, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/usr/local/lib/python3.8/dist-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/usr/local/lib/python3.8/dist-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/status.py", line 166, in compare_status
    src_exists, src_missing = status(
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/status.py", line 132, in status
    odb.hashes_exist(hashes, name=str(odb.path_info), **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 468, in hashes_exist
    remote_hashes = self.list_hashes_exists(hashes, jobs, name)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 419, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 410, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/usr/local/lib/python3.8/dist-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 822, in _exists
    await self._info(path, bucket, key, version_id=version_id)
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 1016, in _info
    out = await self._call_s3(
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 270, in _call_s3
    raise err
OSError: [Errno 22] Bad Request
------------------------------------------------------------
2021-11-23 09:03:02,825 DEBUG: Version info for developers:
DVC version: 2.8.3 (pip)
---------------------------------
Platform: Python 3.8.0 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.27
Supports:
	webhdfs (fsspec = 2021.11.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2021.11.0, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Caches: local
Remotes: None
Workspace directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Repo: dvc, git

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

@efiop Oh, I see. Yeah, that’s definitely a clue! So in the repo you are running dvc pull in, you only have dvc imported artifacts?

Yes - that is correct… In this particular cases - all of the individually imported dvc artifacts originate in the same git artifacts repository.

@efiop Are you able to clone those artifact’s repo and run dvc pull there?

Yes - that work just fine… I just performed

  1. git clone <artifacts_repo>
  2. dvc pull from within the artifacts repo.

Sorry I had meant to say

dvc pull --jobs 1

produced the same error that I originally posted….

I am in a Linux vm - only 5 cpus - the minio endpoint is a reasonable size - and I am the only one currently interacting with it…. (Doubt it’s a capacity issue)

I get the same error on a Windows client 12core laptop… a colleague of mine also received the same error attempting access to the same repo

The ‘none’ - I am guessing might be the git project doesn’t have its own dvc repo configured (remote), rather it only contains dvc imported resources…. Is that a clue?