dvc: Running `exp queue status` is very slow
Running exp queue status
is very slow.
I noticed that after submitting 200 run jobs the queue, the dvc exp queue status
became very slow.
In the order of 5 minutes for giving a result.
dvc doctor:
DVC version: 2.36.0 (pip)
---------------------------------
Platform: Python 3.10.8 on Linux-6.0.11-arch1-1-x86_64-with-glibc2.36
Subprojects:
dvc_data = 0.28.3
dvc_objects = 0.14.0
dvc_render = 0.0.14
dvc_task = 0.1.6
dvclive = 1.1.0
scmrepo = 0.1.4
Supports:
azure (adlfs = 2022.10.0, knack = 0.10.0, azure-identity = 1.11.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: azure, local
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (9 by maintainers)
I think it is important to provide instructions (or a command line comment) to help clean up the situation that @gregstarr described. I experienced the same issue and it made my DVC project directory unusable. I was able to recover, but I can’t remember exactly what I did. Perhaps the solution is in one of the comments above.
The main performance issue here is w/having too many celery message files (since we have to iterate over them for things like
queue status
). Doing the garbage collection to clean up messages which are either expired or irrelevant is implemented in dvc-task now (see linked PR).On the DVC end, we can add something like
exp clean
so users can force us to cleanup things we know we don’t need, but for the celery messages in particular we can also just automatically do it in the background when queue workers exit.@gregstarr removing
.dvc/tmp/exps
will not remove any experiments that have already been finished (but it will remove logs for those experiments).I am seeing this problem as well.
I have 20-30 experiments on the commit, some queued, some failed, some finished successfully and some running.
dvc exp show
,dvc queue status
anddvc queue logs <task>
all take a long time, over 5 minutes.Running from the command line on a linux server RHEL 7.4
not 100% sure what the storage configuration is, e.g. NAS, HDD, SSD, etc.
EDIT: just ran
dvc queue status
, turns out I had 68 and it took about 10 minutes to finish.here is the dump:
I had to rename it as a png to get it to upload, but it was generated as you requested above. just change the extension back to
.prof
@behrica You could force a “wipe” by removing
.dvc/tmp/exps
(https://dvc.org/doc/user-guide/project-structure/internal-files#internal-directories-and-files) if you are certain that there is no important information there