DeepSpeedExamples: Error when using BLOOMZ for reward model training

Hello, I‘m tring to use BLOOMZ for reward model training, and get error:

Traceback (most recent call last):
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 349, in <module>
    main()
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 303, in main
    reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 249, in evaluation_reward
    outputs = model(**batch)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    return forward_call(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
    loss = self.module(*inputs, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    assert divergence_ind > 0, divergence_ind
AssertionError    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    assert divergence_ind > 0

After output divergence_ind I find it is 0 and change assert divergence_ind > 0 to assert divergence_ind >= 0, will this affect the program?

About this issue

Most upvoted comments

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird 😦 You can change the padding style to right-padding to avoid this problem. BTW change “>” to “>=” will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences’ reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved “Grad overflow” by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()