wandb: artifacts do not clean .cache after upload

wandb --version && python --version && uname

  • Weights and Biases version: 0.9.4
  • Python version: 3.6.10
  • Operating System: Win 10

Description

Artifacts upload in wandb runs doesn’t seem to clean up cache files after itself properly.

As I work with my ML experiments, the C:\Users\<Username>\.cache\wandb\artifacts folder grows up to the point that it exhausts all the free space on my drive (up to 50 Gb). After this point, I have to clean it manually to resume my work.

The size of all artifacts to be uploaded within a single run is about 5Gb (counting all the versions of the same files), which is definitely less that the .cache folder size when I discover the problem

Additional errors

At times I get these errors at the end of the run:

C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmp7apg1zqmwandb
  _warnings.warn("Couldn't remove temp directory %s" % name)
C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpxa113_1ewandb-media
  _warnings.warn("Couldn't remove temp directory %s" % name)
C:\ProgramData\Anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpghqvixfewandb
  _warnings.warn("Couldn't remove temp directory %s" % name)
c:\programdata\anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpx0vikdffwandb-media
  _warnings.warn("Couldn't remove temp directory %s" % name)
c:\programdata\anaconda3\lib\site-packages\wandb\compat\tempfile.py:64: UserWarning: Couldn't remove temp directory C:\Users\Maria\AppData\Local\Temp\tmpit2fs8f5wandb
  _warnings.warn("Couldn't remove temp directory %s" % name)

Which seems related to cleaning-up but point to a different Temp folder that DO NOT have this problem of size growing

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (2 by maintainers)

Most upvoted comments

@sydholl I still have this issue, just curious why it got closed?

Hi @maria-korosteleva,

I had the same problem and was able to change the cache directory to another drive available to me. If that’s an option for you, look into setting the environment variable WANDB_CACHE_DIR (see code here).

hey, with the most recent SDK version (0.15+), the cache clean up is a safe operation during a run and during logs. It should not interfere with ongoing artifact operations

And in fact, my very issue is that long running jobs exhaust disk.

@Twilightdonkey those docs give little guidance about whether it is safe to do this during a long-running Lightning job that has wandb logging. GPT4 suggests that this might not be a safe operation if logging is still happening. Nor does the code give any guidance about whether this is a safe operation:

https://github.com/wandb/wandb/blob/a4aa143837e7a43d1f3b11fa3100b1f8e63d27f3/wandb/sdk/interface/artifacts.py#L936

@shawnlewis Wondering if it would be possible to reopen?

@sydholl Can you please explain why is this closed? This problem still persists.

wandb, version 0.13.10
Python 3.8.16
Linux

A new feature to limit the cache size will be highly appreciated.

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.89. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.