wandb: `_disable_stats` doesn't work. `wandb.init(settings=wandb.Settings(_disable_stats=True))` It still sends stats to WANDB, which in turn leads to BSOD due to incompatibility with the old PYNVML dependency in the vendor folder.

_disable_stats doesn’t work. wandb.init(settings=wandb.Settings(_disable_stats=True)) It still sends stats to WANDB, which in turn leads to BSOD due to incompatibility with the old PYNVML dependency in the vendor folder.

_Originally posted by @CosmicHazel in https://github.com/wandb/client/issues/473#issuecomment-1094362410_

Can confirm that this is causing BSOD on Windows platform with Nvidia GPU with latest drivers. And since there’s no way to disable it there’s practically now way to use wandb on Windows

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 31 (5 by maintainers)

Most upvoted comments

I have great news: I installed Nvidia driver version 516.94 (Before I had 516.59) and now it doesn’t crash anymore! Now I can continue advertising wandb to all my colleagues and friends! I even plan to do a presentation about wandb in one of my courses because nobody knows it, even though they are Deep Learning enthusiasts and wandb is awesome!

@dmitryduev Thank you so much for your efforts! @benjamincburns Also thank you for your valuable inputs! 😃

Hey all, many thanks for bringing this to our attention and please accept my apologies for it taking us so long to properly look into. We have updated the vendored version of nvidia-ml-py here and that PR has been merged into master. Could you please try installing wandb from master and let us know if it works now? Would really appreciate that!

I’m so sorry for the wait! I talked to the engineer in charge of this and they mentioned that they would work on it this week

Right now this is the only thing in the FAQ that addresses crashes caused by WandB’s client. I think the lack of clear resolution here really doesn’t align with the values being conveyed in this FAQ entry, as a BSOD clearly affects my training run.

image

@lesliewandb @dmitryduev why was this issue closed? Running the wandb python client with most nvidia driver versions in use today still causes BSODs.

If the issue is going to be closed as completed you should at least capture notes about the workaround on the troubleshooting FAQ page. Given that I don’t see that here, I strongly suspect that many users will continue encountering this problem for quite some time. https://docs.wandb.ai/guides/technical-faq/troubleshooting

Many thanks for the updates, @benjamincburns and @PeterKeffer!

@PeterKeffer: would you mind trying to update the driver to 516.94 and see if it still crashes?

@benjamincburns: I tried repro’ing on a bunch of different Tesla cards on Win 10, 11, and Server 2019, with a number of driver versions within (and outside!) the range you mentioned. Also tried a plain 2080 and it also works. Closing in on a machine with a 2080 Ti, might have an update soon.

Ah interesting. I’m really curious to know why it doesn’t repro for you on all of those boxes. I know Tesla GPUs are using a different driver series, but I wouldn’t expect much of any difference between the 2080 and the 2080 Ti. Thanks for going on such a scavenger hunt!

In the mean time, to turn off sys metrics logging completely (instead of commenting out pynvml calls), could you try

wandb.init(settings=wandb.Settings(_disable_stats=True, _disable_meta=True))

Unfortunately unless there has been a change, per the title of this issue, running with _disable_stats=True wasn’t enough (at the time of writing, anyway) to avoid the BSOD. I’ll give it another try sometime in the next week and report back, however.

Edit: oh, I see - we need the extra _disable_meta arg. Thanks, I’ll make sure to include that when I test next time.