wandb: [CLI]: RuntimeError: "histogram_cpu" not implemented for 'Char'

Describe the bug

46 minutes into training and I got this . Killed everything. Ah!!!


  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1550, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1403, in _run_ddp_forward
    return self.module(*args, **kwargs)  # type: ignore[index]
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1547, in _call_impl
    hook_result = hook(self, args, result)
  File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 110, in <lambda>
    lambda mod, inp, outp: parameter_log_hook(
  File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 105, in parameter_log_hook
    self.log_tensor_stats(data.cpu(), "parameters/" + prefix + name)
  File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 231, in log_tensor_stats
    tensor = flat.histc(bins=self._num_bins, min=tmin, max=tmax)
RuntimeError: "histogram_cpu" not implemented for 'Char'

Additional Files

No response

Environment

WandB version: 0.14.0

OS: ubuntu 22.04

Python version: miniconda 3.9

Versions of relevant libraries: pytorch 2.0 cuda 11.8

Additional Context

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 5
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Hi @teknium1 this PR may fix the issue, but it’s currently under review. I will keep this thread updated once it’s merged to master branch.

Its okay, for whatever reason when I don’t set the wandb settings when launching the trainer - it automatically creates a wandb run and that does not crash lol

This looks to be coming from wandb.watch, maybe explicitly turn it off?

set wandb_watch=false in the alpaca-lora script: https://github.com/tloen/alpaca-lora/blob/main/finetune.py

or os.environ["WANDB_WATCH"] = "false" in your script and see if that works?