tt-metal: Rebase and Merge: FD pipelines hanging on N300 after python sweep tests

With all the tunneling and FD changes, we are seeing a deterministic CI hang, looks like during setting up remote chip. Only happens on N300 machines on CI, cannot reproduce locally, at least on a t3000. If I comment out ./tests/scripts/run_python_sweep_tests.sh from fast-dispatch-build-and-unit-tests.yaml, tests pass.

Example: https://github.com/tenstorrent-metal/tt-metal/actions/runs/8127882306/job/22213201896

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 19 (13 by maintainers)

Commits related to this issue

Most upvoted comments

To enable the tunnelling PR (#5986) we are going to temporarily move the R chip gtests into the nightly test suite and only run L chip tests on post commit

Once we get syseng drop for tt-smi fix + 800 MHz (targeted for 03/06) we will re-test the R chips on VMs

@TT-billteng and @tt-rkim is N300 clean now, can we close?