tt-metal: WH PCC error for `test_moreh_linear.py::test_moreh_linear_backward` w/ Watcher Enabled
TT_METAL_WATCHER=30 pytest tests/tt_eager/python_api_testing/unit_testing/test_moreh_linear.py::test_moreh_linear_backward
Failure is a PCC error:
passing, output_pcc = comp_allclose_and_pcc(torch_bias.grad, ttcpu_bias_grad, pcc=0.999, rtol=rtol, atol=atol)
logger.info(f"bias_grad passing={passing} pcc={output_pcc}")
> assert passing
E assert False
tests/tt_eager/python_api_testing/unit_testing/test_moreh_linear.py:164: AssertionError
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 25 (13 by maintainers)
Commits related to this issue
- #5868: test_moreh_linear_backward watcher error on WH — committed to tenstorrent/tt-metal by TT-billteng 4 months ago
- #5868: test_moreh_linear_backward watcher error on WH — committed to tenstorrent/tt-metal by TT-billteng 4 months ago
- #5868: test_moreh_linear_backward watcher error on WH — committed to tenstorrent/tt-metal by TT-billteng 4 months ago
- #5868: test_moreh_linear_backward watcher error on WH — committed to tenstorrent/tt-metal by TT-billteng 4 months ago
- #5868: test_moreh_linear_backward watcher error on WH — committed to tenstorrent/tt-metal by TT-billteng 4 months ago
Our engineers investigated that using
tile_regs_acquire,tile_regs_wait,tile_regs_commit, andtile_regs_releasefunctions rather thanacquire_dst,release_dstfunctions solves the issue. It is weird thatacquire_dstandrelease_dstare just simple wrappers of thosetile_regs_*functions.Same thing happens for issue #7521. I think we have to make a decision between to options:
acquire_dstandrelease_dstfunction calls to usetile_regs_*functions.acquire_dstandrelease_dstfunctions (I think this is out of scope of Moreh’s ability).@jliangTT , how do you think about this?
hey @dongjin-na , can you please go to this page https://tenstorrent.github.io/tt-metal/latest/tt-metalium/tools/watcher.html#enabling and enable watcher piece-wise to see if you can still reproduce?
Thanks for the investigation @dongjin-na @razorback3 . @jliangTT @jvasilje can you find someone that can help with debug?