dvc: get/import: could not perform a HEAD request
DVC version: 0.62.1
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1')
I’m trying to import a directory versioned in our own dataset registry project into an empty, non-Git DVC project, but getting this cryptic error:
$ dvc import --rev 0547f58 \
git@github.com:iterative/dataset-registry.git \
use-cases/data
Importing 'use-cases/data (git@github.com:iterative/dataset-registry.git)' -> 'data'
ERROR: failed to import 'use-cases/data' from 'git@github.com:iterative/dataset-registry.git'. - unable to find DVC-file with output '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmphs83czecdvc-repo/use-cases/data'
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
The directory in question has file name b6923e1e4ad16ea1a7e2b328842d56a2.dir
(See use-cases/cats-dogs.dvc of that version). And the default remote is [configured[(https://github.com/iterative/dataset-registry/blob/master/.dvc/config) to https://remote.dvc.org/dataset-registry (which is an HTTP redirect to the s3://dvc-public/remote/dataset-registry bucket). The file seems to be in the remote
Am I just doing something wrong here (hopefully), or is dvc import
broken?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (24 by maintainers)
Commits related to this issue
- setup: update pyinstaller to 3.5 As a part of the research for https://github.com/iterative/dvc/issues/2600 — committed to iterative/dvc by efiop 5 years ago
- setup: update pyinstaller to 3.5 (#2615) As a part of the research for https://github.com/iterative/dvc/issues/2600 — committed to iterative/dvc by efiop 5 years ago
- http: reuse requests.Session This way we are able to properly utilize automatic connection pools and not create new fds for each request, which overflows ulimit for max fds very quickly on mac and wi... — committed to efiop/dvc by efiop 5 years ago
- http: reuse requests.Session (#2646) This way we are able to properly utilize automatic connection pools and not create new fds for each request, which overflows ulimit for max fds very quickly on ... — committed to iterative/dvc by efiop 5 years ago
https://requests.kennethreitz.org/en/master/user/advanced/ says that session is using a connection pool by default. Chaning to using session instead of requests.request directly made everything work for me and I no longer see fluctuations in fd numbers. Will send a patch ASAP. Kudos @pared 🎉
Reproduction steps for Linux: script
Number of max connections here needs to be changed to some big amount. For me 10k worked.
@jorgeorpinel oops, sorry, wrong issue.
Little summary so far:
remote/base.cache_exists
fetch/pull
does not have the same problems asimport/get
Possible way of handling the problem: The problem might be triggered because
requests.sessions.Session
object is created upon eachrequests.request
calls. Maybe we could solve that by creating our ownSession
object, mounting properHTTPAdapters
and reusing this session, instead of callingrequests.request
each time.It seems that the problem is that, with every request send, we are reserving socket “through”
requests
API, which is taking open file descriptor slot. In this particular case, in methodRemoteLOCAL.cache_exists
we try to paralelly do a lot ofHEAD
calls which leads to overcoming open file descriptors limit.example:
ulimit -n 16
and run:
Can reproduce on my mac, but not on linux
@jorgeorpinel yes, please open a new UI issue!
@jorgeorpinel should it be
use-cases/cats-dogs
?