gpt-neox: Fine-tuning 20B model doesn't seem to work
Hi,
I’m trying to fine-tune the 20B model, I tried the current version of the code and this one. I am using Docker, and I tried several images from the past year (the most recent ones up to the one labeled as “release”). I tried both the slim and full weights.
I tried nodes of 8 and 16 A100s-40GB, so I don’t think it is a memory issue. I am using the 20B.yml config file and I am adding:
{ "finetune": true, "no_load_optim": true, "no_load_rng": true, "iteration": 0 }
With the newer Docker images, I get an error that says “Empty ds_version in checkpoint”, I guess this is related to this issue.
However, when I use the older Docker images (with both the new and legacy version of the code) I get an error that says AttributeError: ‘NoneType’ object has no attribute ‘dp_process_group’. I guess this is related to this issue. As someone said at the time, “this is an error with deepspeed trying to load zero optimizer states if you specify one in your config, even if we set load_optim to false.” Setting the state to 0, the model loads but crashes later (similar to this issue.)
Do you have an idea? Thank you!
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 17 (8 by maintainers)
@afeb-75 – Is there a reason you can’t load the model with zero stage 1?
I’m sorry you’ve been having trouble with this. We are aware of the issue but do not have the personnel to prioritize patching this at this time. At this time I recommend using the HuggingFace
transformerslibrary for finetuning the model.If you are interested in developing and contributing a patch, we would be ecstatic to merge it into
mainto prevent others from struggling with this.