gpt-neox: Problems on generating with llama model
Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints.
-
Error in loading the llama checkpoint. The converted checkpoint generated by the script
tools/convert_raw_llama_weights_to_neox.pyprovides no optimizer states. but thedeepspeedkeep trying to load the optimizer state even if I set thefinetuneflag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not. -
Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where
attention_mask = attention_mask[
..., : attention_scores.size(3), : attention_scores.size(3)
]
is change to
attention_mask = attention_mask[
..., : attention_scores.size(2), : attention_scores.size(3)
]
I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 21 (3 by maintainers)
Keep this issue open until these issues are resolved. We’ll add a “Fixes xxx” clause that auto-closes this issue to whatever PR fixes things.
I just use the neox format for training/fine-tuning. After training, I convert it into HF version for inference/testing.