wandb: [CLI]: RuntimeError: "histogram_cpu" not implemented for 'Char'
Describe the bug
46 minutes into training and I got this . Killed everything. Ah!!!
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1550, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1403, in _run_ddp_forward
return self.module(*args, **kwargs) # type: ignore[index]
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1547, in _call_impl
hook_result = hook(self, args, result)
File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 110, in <lambda>
lambda mod, inp, outp: parameter_log_hook(
File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 105, in parameter_log_hook
self.log_tensor_stats(data.cpu(), "parameters/" + prefix + name)
File "/root/miniconda3/lib/python3.9/site-packages/wandb/wandb_torch.py", line 231, in log_tensor_stats
tensor = flat.histc(bins=self._num_bins, min=tmin, max=tmax)
RuntimeError: "histogram_cpu" not implemented for 'Char'
Additional Files
No response
Environment
WandB version: 0.14.0
OS: ubuntu 22.04
Python version: miniconda 3.9
Versions of relevant libraries: pytorch 2.0 cuda 11.8
Additional Context
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 5
- Comments: 15 (8 by maintainers)
Its okay, for whatever reason when I don’t set the wandb settings when launching the trainer - it automatically creates a wandb run and that does not crash lol
This looks to be coming from
wandb.watch, maybe explicitly turn it off?set wandb_watch=false in the alpaca-lora script: https://github.com/tloen/alpaca-lora/blob/main/finetune.py
or
os.environ["WANDB_WATCH"] = "false"in your script and see if that works?