tt-metal: Falcon40b prefill 4 chip hang with 60 layers

This commit (or the previous one, seems they have been merged together) broke the prefill. Commit before that runs 60 layers without issues.

A couple of layers run normally. Frequency is at 800mhz

To reproduce you can run on main:

pytest models/demos/falcon40b/tests/test_falcon_end_to_end.py -k "prefill and layers_60 and BFLOAT8" --timeout 1000

However, you would need to have all Falcon40b weights downloaded and prepared and this takes awhile, so we can also sync offline to see how to debug this together since I have the machine set up.

cc @johanna-rock-tt

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Great, I’ll close the issue then. The commit is in main now.

Verified - both the decode demo and prefill 60 layers pass on main + cherry-pick!