doremi: Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).

Hi, thanks for sharing this code base.

After I run the script of bash scripts/run_pile.sh, I obtain the following results: image

The generated domain reweights have slightly differences from the released domain reweights: image

Since I am in Mainland China, I download the tokenizer manually. But I cannot find togethercomputer/RedPajama-INCITE-Base-7B-v0.1 in Huggingface, I use the tokenizer togethercomputer/RedPajama-INCITE-Base-7B. I think they are the same.

About this issue

  • Original URL
  • State: open
  • Created 6 months ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

One thing that might be different is that I used flash attention 2.0.4 for the results in the README, while the repo right now points to flash attention 2.0.0 (I did this because I had to make some manual changes to 2.0.4 to make it work with the version of transformers used in the repo). However, it seems that 2.0.0 might break generation (https://github.com/huggingface/transformers/issues/26697). I pushed the flash attention version I used to the repo just now, could you see if that makes a difference?

@kiseliu Thanks for your information. I would like to discuss the experimental configurations in more detail. Could you see the email I sent you (gmail)?