dvc: RuntimeError: can't start new thread with `dvc import-url`

Bug Report

Issue at hand

I cannot seem to be able to run dvc import-url. I tried the pip and conda dvc versions, python-3.7 and python-3.8 but I’m getting this error: ERROR: unexpected error - can't start new thread

I tried to import data from a local URL (dvc repo) on a shared cluster. I am running dvc on an HPC cluster where users have limits on their threads - maybe it’s caused by this? My nproc soft and hard limits are both 4096.

Steps to reproduce:

  1. create three directories: dvc_data_registry, dvc_local_cache, dvc_local_repo
  2. in dvc_data_registry do git init and dvc init; configure dvc repo location to dvc_local_repo and cache to dvc_local_cache
  3. create random files in dvc_data_registry/data/imgs, 2000 random files. For example, I used this .
  4. dvc add dvc_data_registry/data/imgs
  5. git add -A and git commit -m "init"
  6. create a new directory example_data_science_project and in it create ‘data’ directory
  7. in example_data_science_project do git init and dvc init
  8. do dvc import-url dvc_data_registry/data/imgs ./data/ -v in example_data_science_project

Output of dvc version:

$ dvc version
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git

Additional Information (if any):

Here is the output of this command with --verbose set:

(/gpfs/hpc/home/myuser/conda_dvc2) [myuser@cluster debug_example_project]$ dvc import-url /gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs ./data/ -v
2020-12-03 21:23:54,114 DEBUG: Check for update is enabled.
2020-12-03 21:23:54,124 DEBUG: fetched: [(3,)]                        
2020-12-03 21:23:55,359 DEBUG: Removing output 'data/imgs' of stage: 'data/imgs.dvc'.
2020-12-03 21:23:55,359 DEBUG: Removing '/gpfs/hpc/home/myuser/debug_example_project/data/imgs'
Importing '/gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs' -> 'data/imgs'
2020-12-03 21:23:55,394 DEBUG: Computed stage: 'data/imgs.dvc' md5: '4eed72597cbae32b38642fcfe9ab6048'
2020-12-03 21:23:55,394 DEBUG: 'md5' of stage: 'data/imgs.dvc' changed.
2020-12-03 21:23:56,599 DEBUG: Path '/gpfs/hpc/home/myuser/debug_dvc_registry/data/imgs' inode '129044738'
2020-12-03 21:23:56,600 DEBUG: fetched: [('9f96693c42f5bb6ecc3b090018dde2cf', '34543513', '3f0aff1ab97252c0486820992ed9fb25.dir', '1607023419108258816')]
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1368.bin' to 'data/imgs/file1368.bin'                                                                                                                                                       
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1632.bin' to 'data/imgs/file1632.bin'                                                                                                                                                       
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file720.bin' to 'data/imgs/file720.bin'                                                                                                                                                         
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file730.bin' to 'data/imgs/file730.bin'                                                                                                                                                         
2020-12-03 21:23:56,657 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file109.bin' to 'data/imgs/file109.bin'                                                                                                                                                         
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file085.bin' to 'data/imgs/file085.bin'                                                                                                                                                         
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file793.bin' to 'data/imgs/file793.bin'                                                                                                                                                         
2020-12-03 21:23:56,658 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file940.bin' to 'data/imgs/file940.bin'                                                                                                                                                         
2020-12-03 21:23:56,659 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file964.bin' to 'data/imgs/file964.bin'                                                                                                                                                         
2020-12-03 21:23:56,660 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file849.bin' to 'data/imgs/file849.bin'                                                                                                                                                         
2020-12-03 21:23:56,661 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file742.bin' to 'data/imgs/file742.bin'                                                                                                                                                         
2020-12-03 21:23:56,661 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1290.bin' to 'data/imgs/file1290.bin'                                                                                                                                                       
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1326.bin' to 'data/imgs/file1326.bin'                                                                                                                                                       
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1514.bin' to 'data/imgs/file1514.bin'                                                                                                                                                       
2020-12-03 21:23:56,663 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file169.bin' to 'data/imgs/file169.bin'                                                                                                                                                         
2020-12-03 21:23:56,664 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file148.bin' to 'data/imgs/file148.bin'                                                                                                                                                         
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file489.bin' to 'data/imgs/file489.bin'                                                                                                                                                         
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file515.bin' to 'data/imgs/file515.bin'                                                                                                                                                         
2020-12-03 21:23:56,666 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1274.bin' to 'data/imgs/file1274.bin'                                                                                                                                                       
2020-12-03 21:23:56,668 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1216.bin' to 'data/imgs/file1216.bin'                                                                                                                                                       
2020-12-03 21:23:56,668 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file121.bin' to 'data/imgs/file121.bin'                                                                                                                                                         
2020-12-03 21:23:56,669 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file050.bin' to 'data/imgs/file050.bin'                                                                                                                                                         
2020-12-03 21:23:56,671 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1172.bin' to 'data/imgs/file1172.bin'                                                                                                                                                       
2020-12-03 21:23:56,672 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1173.bin' to 'data/imgs/file1173.bin'                                                                                                                                                       
2020-12-03 21:23:56,673 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file876.bin' to 'data/imgs/file876.bin'                                                                                                                                                         
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file626.bin' to 'data/imgs/file626.bin'                                                                                                                                                         
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file132.bin' to 'data/imgs/file132.bin'                                                                                                                                                         
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file556.bin' to 'data/imgs/file556.bin'                                                                                                                                                         
2020-12-03 21:23:56,675 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1189.bin' to 'data/imgs/file1189.bin'                                                                                                                                                       
2020-12-03 21:23:56,677 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1716.bin' to 'data/imgs/file1716.bin'                                                                                                                                                       
2020-12-03 21:23:56,678 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file039.bin' to 'data/imgs/file039.bin'                                                                                                                                                         
2020-12-03 21:23:56,679 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file764.bin' to 'data/imgs/file764.bin'                                                                                                                                                         
2020-12-03 21:23:56,681 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file573.bin' to 'data/imgs/file573.bin'                                                                                                                                                         
2020-12-03 21:23:56,682 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file513.bin' to 'data/imgs/file513.bin'                                                                                                                                                         
2020-12-03 21:23:56,683 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1618.bin' to 'data/imgs/file1618.bin'                                                                                                                                                       
2020-12-03 21:23:56,685 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file758.bin' to 'data/imgs/file758.bin'                                                                                                                                                         
2020-12-03 21:23:56,686 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file250.bin' to 'data/imgs/file250.bin'                                                                                                                                                         
2020-12-03 21:23:56,686 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file988.bin' to 'data/imgs/file988.bin'                                                                                                                                                         
2020-12-03 21:23:56,689 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file1162.bin' to 'data/imgs/file1162.bin'                                                                                                                                                       
2020-12-03 21:23:56,689 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file737.bin' to 'data/imgs/file737.bin'                                                                                                                                                         
2020-12-03 21:23:56,690 DEBUG: Downloading '../debug_dvc_registry/data/imgs/file947.bin' to 'data/imgs/file947.bin'                                                                                                                                                         
2020-12-03 21:23:56,768 DEBUG: fetched: [(2,)]                                                                                                                                                                                                                              
2020-12-03 21:23:56,827 ERROR: unexpected error - can't start new thread
------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/main.py", line 90, in main
    ret = cmd.run()
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/command/imp_url.py", line 19, in run
    desc=self.args.desc,
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/__init__.py", line 54, in wrapper
    return f(repo, *args, **kwargs)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 64, in imp_url
    stage.run()
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/__init__.py", line 500, in run
    sync_import(self, dry, force)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/stage/imports.py", line 30, in sync_import
    stage.deps[0].download(stage.outs[0])
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/output/base.py", line 337, in download
    self.tree.download(self.path_info, to.path_info)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 409, in download
    from_info, to_info, name, no_progress_bar, file_mode, dir_mode
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 441, in _download_dir
    for from_info, to_info in zip(from_infos, to_infos)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/site-packages/dvc/tree/base.py", line 441, in <listcomp>
    for from_info, to_info in zip(from_infos, to_infos)
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/concurrent/futures/thread.py", line 172, in submit
    self._adjust_thread_count()
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/concurrent/futures/thread.py", line 193, in _adjust_thread_count
    t.start()
  File "/gpfs/hpc/home/myuser/conda_dvc2/lib/python3.7/threading.py", line 852, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
------------------------------------------------------------
2020-12-03 21:24:01,038 DEBUG: Version info for developers:
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git
 
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2020-12-03 21:24:01,042 DEBUG: Analytics is enabled.
2020-12-03 21:24:03,394 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp9wgcp0gp']'
2020-12-03 21:24:03,400 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp9wgcp0gp']'
(/gpfs/hpc/home/myuser/conda_dvc2) [myuser@cluster debug_example_project]$ dvc version
DVC version: 1.11.2 (conda)
---------------------------------
Platform: Python 3.7.8 on Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Caches: local
Remotes: None
Repo: dvc, git

In addition, I was also having problems with dvc push: I got the RuntimeError: can't start new thread error again. But with push I used --jobs 1 option and then it worked.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@Fotomaterjal, is it possible for you to check with the recent master version (See below for installation)?

We have improved a few things internally that should make it use less threads than before. Thanks.

$ pip install git+https://github.com/iterative/dvc#egg=dvc

Thanks! I’ll test it ASAP. Probably in the next few days.

@Fotomaterjal, would dvc get --jobs work for you? We should be able to implement this option after #4977 is done.

Yes, --jobs would most probably do the trick for me

Seems like this just needs the changes from https://github.com/iterative/dvc/pull/4977 added to dvc import-url.

For push/pull we also have the remote.jobs config option now, it would probably make sense to have the same thing for import/import-url as well