pytorch-lightning: WandbLogger(log_model=False) does not work, model is always saved to wandb logs

🐛 Bug

WandbLogger log_model argument does not work as expected. The model checkpoint is always being saved to wandb logs (project_name/run/checkpoints/checkpoint_name.ckpt). The model is not uploaded to wandb, only saved on the drive. It should not be saved on disk nor uploaded. It is especially important during sweeping (hundreds of runs exhaust available space quickly). I don’t use ModelCheckpoint here so the weights should not be saved at all.

To Reproduce

wandb_logger = WandbLogger(project=‘private_example’, config={ **model_args.dict, **data_args.dict, **training_args.dict }, log_model=False) trainer = pl.Trainer(…, logger=[wandb_logger])

Expected behavior

Checkpoints are saved if log_model True and not saved otherwise.

Environment

  • PyTorch Version (e.g., 1.0): 1.7.1
  • OS (e.g., Linux): Debian
  • How you installed PyTorch (conda, pip, source): conda
  • Python version: 3.6.9
  • CUDA/cuDNN version: cuda 11
  • wandb version: 0.10.15 (same problem on 0.10.17)
  • lightning version: 1.1.4 (same problem on 1.1.8)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Hey in new version 1.5.9 of pytorch lightning and wandb version of 0.12.10 same issue is arising. even I have tried doing log_model = False and there is no such parameter as checkpoint_callbacks=False

  1. Checkpointing is turned on by default. You can turn it off with checkpoint_callbacks=False.

  2. It looks like wandb will upload everything that is in the default logging directory under wandb. If you set save_dir="./somewhere" you can prevent this.

We use wandb.save here, so it is normal that the file ends up on the disk first. It should get uploaded only at the end of the run. Maybe your process gets killed and wandb with it too, it would explain why artifacts are not uploaded

It should not be saved on disk nor uploaded. Is it possible with WandbLogger? I just don’t want 500 checkpoints on SSD after wandb sweep…

Oh, I see! I am sorry, i misunderstood what you said. Thanks! 😃

As I have already stated, the issue is confirmed for 1.3.4 and fixed in the latest version. In order to see a change, you will have to upgrade Lightning using pip install --upgrade pytorch-lightning. Yes you can add wandb.init() and finish(), but it doesn’t matter.

Thank you, looks like checkpoint_callbacks=False works right (though WandbLogger documentation is misleading).