doremi: Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).
Hi, thanks for sharing this code base.
After I run the script of bash scripts/run_pile.sh
, I obtain the following results:
The generated domain reweights have slightly differences from the released domain reweights:
Since I am in Mainland China, I download the tokenizer manually. But I cannot find togethercomputer/RedPajama-INCITE-Base-7B-v0.1
in Huggingface, I use the tokenizer togethercomputer/RedPajama-INCITE-Base-7B
. I think they are the same.
About this issue
- Original URL
- State: open
- Created 6 months ago
- Comments: 17 (3 by maintainers)
One thing that might be different is that I used flash attention 2.0.4 for the results in the README, while the repo right now points to flash attention 2.0.0 (I did this because I had to make some manual changes to 2.0.4 to make it work with the version of transformers used in the repo). However, it seems that 2.0.0 might break generation (https://github.com/huggingface/transformers/issues/26697). I pushed the flash attention version I used to the repo just now, could you see if that makes a difference?
@kiseliu Thanks for your information. I would like to discuss the experimental configurations in more detail. Could you see the email I sent you (gmail)?