dvc: exp: Too many openfiles error in `exp` related commands.
Bug Report
Sometimes on MacOS, we will meet the “too many open files error”. This can happen in exp run
, exp show
or inside celery work.
And the time it happening is also varied, setup, running, or result collection, I met this error in all stages. ulimit
to a larger number ( for example 1024 instead of default 256 ) can prevent this.
Traceback (most recent call last):
File "/Users/gao/Code/dvc/dvc/repo/experiments/queue/tasks.py", line 66, in collect_exp
BaseStashQueue.collect_executor(
File "/Users/gao/Code/dvc/dvc/repo/experiments/queue/base.py", line 643, in collect_executor
results = cls.collect_git(exp, executor, exec_result)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/funcy/flow.py", line 127, in retry
return call()
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/Users/gao/Code/dvc/dvc/repo/experiments/utils.py", line 40, in wrapper
return f(exp, *args, **kwargs)
File "/Users/gao/Code/dvc/dvc/repo/experiments/queue/base.py", line 623, in collect_git
for ref in executor.fetch_exps(
File "/Users/gao/Code/dvc/dvc/repo/experiments/executor/base.py", line 367, in fetch_exps
dest_scm.fetch_refspecs(
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
result = func(*args, **kwargs)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 634, in fetch_refspecs
fetch_result = client.fetch(
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/client.py", line 1502, in fetch
refs = r.fetch(
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/repo.py", line 427, in fetch
count, pack_data = self.fetch_pack_data(
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/repo.py", line 460, in fetch_pack_data
objects = self.fetch_objects(
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/repo.py", line 494, in fetch_objects
obj = self.object_store[sha]
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/object_store.py", line 144, in __getitem__
type_num, uncomp = self.get_raw(sha)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/object_store.py", line 581, in get_raw
ret = self._get_loose_object(hexsha)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/object_store.py", line 745, in _get_loose_object
return ShaFile.from_path(path)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/objects.py", line 420, in from_path
with GitFile(path, "rb") as f:
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/dulwich/file.py", line 94, in GitFile
return io.open(filename, mode, bufsize)
OSError: [Errno 24] Too many open files: '/Users/gao/test/vscode-dvc/demo/.dvc/tmp/exps/tmp9x5praul/.git/objects/55/4aa474348369ca4eec2945226686b9e4f11666'
[2022-10-25 16:25:09,522: ERROR/MainProcess] Task dvc.repo.experiments.queue.tasks.run_exp[cf2fd3f4-a0c2-4c42-80b9-9554e03d9a55] raised unexpected: OSError(24, 'Too many open files')
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 628, in _rmtree_safe_fd
with os.scandir(topfd) as scandir_it:
OSError: [Errno 24] Too many open files: '/Users/gao/test/vscode-dvc/demo/.dvc/tmp/exps/tmp9x5praul/demo'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/celery/app/trace.py", line 734, in __protected_call__
return self.run(*args, **kwargs)
File "/Users/gao/Code/dvc/dvc/repo/experiments/queue/tasks.py", line 111, in run_exp
cleanup_exp.s(executor, infofile)()
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/celery/canvas.py", line 168, in __call__
return self.type(*args, **kwargs)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/celery/app/trace.py", line 735, in __protected_call__
return orig(self, *args, **kwargs)
File "/Users/gao/test/vscode-dvc/demo/.venv/lib/python3.10/site-packages/celery/app/task.py", line 392, in __call__
return self.run(*args, **kwargs)
File "/Users/gao/Code/dvc/dvc/repo/experiments/queue/tasks.py", line 87, in cleanup_exp
executor.cleanup(infofile)
File "/Users/gao/Code/dvc/dvc/repo/experiments/executor/local.py", line 128, in cleanup
remove(self.root_dir)
File "/Users/gao/Code/dvc/dvc/utils/fs.py", line 69, in remove
shutil.rmtree(path, onerror=_chmod)
File "/opt/homebrew/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 724, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/opt/homebrew/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 657, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/opt/homebrew/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 632, in _rmtree_safe_fd
onerror(os.scandir, path, sys.exc_info())
File "/Users/gao/Code/dvc/dvc/utils/fs.py", line 54, in _chmod
func(p)
OSError: [Errno 24] Too many open files: '/Users/gao/test/vscode-dvc/demo/.dvc/tmp/exps/tmp9x5praul/demo'
Description
Reproduce
Expected
Environment information
Output of dvc doctor
:
$ dvc doctor
Additional Information (if any):
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (18 by maintainers)
@karajan1001 I’m curious if anyone else on the vscode team has run into this same problem? Also, could you run
git count-objects -vH
in your clone of the vscode repo? (In your main clone of the repo, not in a temp exp workspace)What I am wondering is that if you aren’t making regular changes in the vscode repo (and are only ever using
dvc exp ...
for testing purposes), you also aren’t using CLI git in that clone regularly, so it may not be gc’d at all, which could also lead to dulwich hitting the file limit when we try to collect exps. There could be too many loose objects on the receiving end of the fetch (in the main repo) and not necessarily too many objects in the source end (in an exp temp workspace). A typical user would not hit this scenario, because they would presumably be using regular CLI git themselves in conjunction with DVC on a regular enough basis that git would perform a gc at some point.I think we can close this for now given that it is unlikely this will occur in normal DVC use (where users are at least semi-regularly also using CLI git commands)
Yes, I never saw this error after I
gc
ed the repo.