dvc: RuntimeError: can't start new thread with `dvc import-url`
Bug Report
Issue at hand
I cannot seem to be able to run dvc import-url. I tried the pip and conda dvc versions, python-3.7 and python-3.8 but I’m getting this error:
ERROR: unexpected error - can't start new thread
I tried to import data from a local URL (dvc repo) on a shared cluster. I am running dvc on an HPC cluster where users have limits on their threads - maybe it’s caused by this? My nproc soft and hard limits are both 4096.
Steps to reproduce:
- create three directories:
dvc_data_registry
,dvc_local_cache
,dvc_local_repo
- in
dvc_data_registry
dogit init
anddvc init
; configure dvc repo location todvc_local_repo
and cache todvc_local_cache
- create random files in
dvc_data_registry/data/imgs
, 2000 random files. For example, I used this . dvc add dvc_data_registry/data/imgs
git add -A
andgit commit -m "init"
- create a new directory
example_data_science_project
and in it create ‘data’ directory - in
example_data_science_project
dogit init
anddvc init
- do
dvc import-url dvc_data_registry/data/imgs ./data/ -v
inexample_data_science_project
Output of dvc version
:
$ dvc version
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git
Additional Information (if any):
Here is the output of this command with --verbose set:
(/gpfs/hpc/home/myuser/conda_dvc2) [myuser@cluster debug_example_project]$ dvc import-url /gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs ./data/ -v
2020-12-03 21:23:54,114 DEBUG: Check for update is enabled.
2020-12-03 21:23:54,124 DEBUG: fetched: [(3,)]
2020-12-03 21:23:55,359 DEBUG: Removing output 'data/imgs' of stage: 'data/imgs.dvc'.
2020-12-03 21:23:55,359 DEBUG: Removing '/gpfs/hpc/home/myuser/debug_example_project/data/imgs'
Importing '/gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs' -> 'data/imgs'
2020-12-03 21:23:55,394 DEBUG: Computed stage: 'data/imgs.dvc' md5: '4eed72597cbae32b38642fcfe9ab6048'
2020-12-03 21:23:55,394 DEBUG: 'md5' of stage: 'data/imgs.dvc' changed.
2020-12-03 21:23:56,599 DEBUG: Path '/gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs' inode '129044738'
2020-12-03 21:23:56,600 DEBUG: fetched: [('9f96693c42f5bb6ecc3b090018dde2cf', '34543513', '3f0aff1ab97252c0486820992ed9fb25.dir', '1607023419108258816')]
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1368.bin' to 'data/imgs/file1368.bin'
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1632.bin' to 'data/imgs/file1632.bin'
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file720.bin' to 'data/imgs/file720.bin'
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file730.bin' to 'data/imgs/file730.bin'
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file109.bin' to 'data/imgs/file109.bin'
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file085.bin' to 'data/imgs/file085.bin'
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file793.bin' to 'data/imgs/file793.bin'
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file940.bin' to 'data/imgs/file940.bin'
2020-12-03 21:23:56,659 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file964.bin' to 'data/imgs/file964.bin'
2020-12-03 21:23:56,660 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file849.bin' to 'data/imgs/file849.bin'
2020-12-03 21:23:56,661 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file742.bin' to 'data/imgs/file742.bin'
2020-12-03 21:23:56,661 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1290.bin' to 'data/imgs/file1290.bin'
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1326.bin' to 'data/imgs/file1326.bin'
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1514.bin' to 'data/imgs/file1514.bin'
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file169.bin' to 'data/imgs/file169.bin'
2020-12-03 21:23:56,664 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file148.bin' to 'data/imgs/file148.bin'
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file489.bin' to 'data/imgs/file489.bin'
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file515.bin' to 'data/imgs/file515.bin'
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1274.bin' to 'data/imgs/file1274.bin'
2020-12-03 21:23:56,668 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1216.bin' to 'data/imgs/file1216.bin'
2020-12-03 21:23:56,668 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file121.bin' to 'data/imgs/file121.bin'
2020-12-03 21:23:56,669 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file050.bin' to 'data/imgs/file050.bin'
2020-12-03 21:23:56,671 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1172.bin' to 'data/imgs/file1172.bin'
2020-12-03 21:23:56,672 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1173.bin' to 'data/imgs/file1173.bin'
2020-12-03 21:23:56,673 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file876.bin' to 'data/imgs/file876.bin'
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file626.bin' to 'data/imgs/file626.bin'
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file132.bin' to 'data/imgs/file132.bin'
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file556.bin' to 'data/imgs/file556.bin'
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1189.bin' to 'data/imgs/file1189.bin'
2020-12-03 21:23:56,677 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1716.bin' to 'data/imgs/file1716.bin'
2020-12-03 21:23:56,678 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file039.bin' to 'data/imgs/file039.bin'
2020-12-03 21:23:56,679 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file764.bin' to 'data/imgs/file764.bin'
2020-12-03 21:23:56,681 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file573.bin' to 'data/imgs/file573.bin'
2020-12-03 21:23:56,682 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file513.bin' to 'data/imgs/file513.bin'
2020-12-03 21:23:56,683 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1618.bin' to 'data/imgs/file1618.bin'
2020-12-03 21:23:56,685 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file758.bin' to 'data/imgs/file758.bin'
2020-12-03 21:23:56,686 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file250.bin' to 'data/imgs/file250.bin'
2020-12-03 21:23:56,686 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file988.bin' to 'data/imgs/file988.bin'
2020-12-03 21:23:56,689 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1162.bin' to 'data/imgs/file1162.bin'
2020-12-03 21:23:56,689 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file737.bin' to 'data/imgs/file737.bin'
2020-12-03 21:23:56,690 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file947.bin' to 'data/imgs/file947.bin'
2020-12-03 21:23:56,768 DEBUG: fetched: [(2,)]
2020-12-03 21:23:56,827 ERROR: unexpected error - can't start new thread
------------------------------------------------------------
Traceback (most recent call last):
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/main.py", line 90, in main
ret = cmd.run()
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/command/imp_url.py", line 19, in run
desc=self.args.desc,
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/__init__.py", line 54, in wrapper
return f(repo, *args, **kwargs)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
result = method(repo, *args, **kw)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 64, in imp_url
stage.run()
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
return call()
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/__init__.py", line 500, in run
sync_import(self, dry, force)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/imports.py", line 30, in sync_import
stage.deps[0].download(stage.outs[0])
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/output/base.py", line 337, in download
self.tree.download(self.path_info, to.path_info)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 409, in download
from_info, to_info, name, no_progress_bar, file_mode, dir_mode
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 441, in _download_dir
for from_info, to_info in zip(from_infos, to_infos)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 441, in <listcomp>
for from_info, to_info in zip(from_infos, to_infos)
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/concurrent/futures/thread.py", line 172, in submit
self._adjust_thread_count()
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/concurrent/futures/thread.py", line 193, in _adjust_thread_count
t.start()
File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
------------------------------------------------------------
2020-12-03 21:24:01,038 DEBUG: Version info for developers:
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2020-12-03 21:24:01,042 DEBUG: Analytics is enabled.
2020-12-03 21:24:03,394 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp9wgcp0gp']'
2020-12-03 21:24:03,400 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp9wgcp0gp']'
(/gpfs/hpc/home/myuser/conda_dvc2) [myuser@cluster debug_example_project]$ dvc version
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git
In addition, I was also having problems with dvc push
: I got the RuntimeError: can't start new thread
error again. But with push I used --jobs 1
option and then it worked.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (10 by maintainers)
Commits related to this issue
- dvc: bump to 2.0.0a0 Looking at the log is confusing, as the version is stuck in `1.11.0` whereas the recent one is `1.11.10`. As the updater is also dependent on that version, it prints an ugly u... — committed to skshetry/dvc by skshetry 3 years ago
- dvc: bump to 2.0.0a0 (#5250) Looking at the log is confusing, as the version is stuck in `1.11.0` whereas the recent one is `1.11.10`. As the updater is also dependent on that version, it prints a... — committed to iterative/dvc by skshetry 3 years ago
Thanks! I’ll test it ASAP. Probably in the next few days.
Yes,
--jobs
would most probably do the trick for meSeems like this just needs the changes from https://github.com/iterative/dvc/pull/4977 added to
dvc import-url
.For push/pull we also have the
remote.jobs
config option now, it would probably make sense to have the same thing for import/import-url as well