wandb: artifacts do not clean .cache after upload
wandb --version && python --version && uname
- Weights and Biases version: 0.9.4
- Python version: 3.6.10
- Operating System: Win 10
Description
Artifacts upload in wandb runs doesn’t seem to clean up cache files after itself properly.
As I work with my ML experiments, the C:\Users\<Username>\.cache\wandb\artifacts folder grows up to the point that it exhausts all the free space on my drive (up to 50 Gb). After this point, I have to clean it manually to resume my work.
The size of all artifacts to be uploaded within a single run is about 5Gb (counting all the versions of the same files), which is definitely less that the .cache folder size when I discover the problem
Additional errors
At times I get these errors at the end of the run:
C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmp7apg1zqmwandb
_warnings.warn("Couldn't remove temp directory %s" % name)
C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpxa113_1ewandb-media
_warnings.warn("Couldn't remove temp directory %s" % name)
C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpghqvixfewandb
_warnings.warn("Couldn't remove temp directory %s" % name)
c:\programdata\anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpx0vikdffwandb-media
_warnings.warn("Couldn't remove temp directory %s" % name)
c:\programdata\anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpit2fs8f5wandb
_warnings.warn("Couldn't remove temp directory %s" % name)
Which seems related to cleaning-up but point to a different Temp folder that DO NOT have this problem of size growing
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 16 (2 by maintainers)
@sydholl I still have this issue, just curious why it got closed?
Hi @maria-korosteleva,
I had the same problem and was able to change the cache directory to another drive available to me. If that’s an option for you, look into setting the environment variable
WANDB_CACHE_DIR(see code here).hey, with the most recent SDK version (0.15+), the cache clean up is a safe operation during a run and during logs. It should not interfere with ongoing artifact operations
And in fact, my very issue is that long running jobs exhaust disk.
@Twilightdonkey those docs give little guidance about whether it is safe to do this during a long-running Lightning job that has wandb logging. GPT4 suggests that this might not be a safe operation if logging is still happening. Nor does the code give any guidance about whether this is a safe operation:
https://github.com/wandb/wandb/blob/a4aa143837e7a43d1f3b11fa3100b1f8e63d27f3/wandb/sdk/interface/artifacts.py#L936
@shawnlewis Wondering if it would be possible to reopen?
@sydholl Can you please explain why is this closed? This problem still persists.
A new feature to limit the cache size will be highly appreciated.
Issue-Label Bot is automatically applying the label
bugto this issue, with a confidence of 0.89. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.