tt-metal: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole]
ttl.tensor.reduce_max_w operation breaks with low PCC error in some test cases.
To Reproduce
Steps to reproduce the behavior:
- Checkout
mainbranch - Run unit test
test_reduce_max_w.pyusing this command:pytest tests/tt_eager/python_api_testing/non_working_unit_tests/wormhole/test_reduce_max_w.py
Expected behavior
There are 6 test cases presented in the unit test test_reduce_max_w.py and they all are expected to fail with low PCC error.
For example, one of the tests is expected to fail with this result:
Max ATOL Delta: 184.0, Max RTOL Delta: 2.234375, PCC: 0.05683090068563086, Equal check failed
Getting Additional info for the operation under test and its behavior
To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweeps for ttl.tensor.ne and check the results. To do this you should:
- Follow the
Getting Startedpage to setup the repo, environment variables andpython-env - Activate
source build/python_env/bin/activate - Run sweeps by using
python tests/tt_eager/python_api_testing/sweep_tests/run_pytorch_test.py -i tests/tt_eager/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/wormhole/pytorch_reduce_max_w_test.yaml -o ./result-sweeps - After the run is completed all test sweeps results should be available inside specified output directory (in this case ./result-sweeps). There you will find
reduce_max_w_sweep.csvwhich holds all executed sweeps, among which you can also find the ones that failed and were recreated by the unit test, which you can get by searching uniquedata_seedfield.
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 17 (9 by maintainers)
Commits related to this issue
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: ttl.tensor.reduce_max_w operation breaks with low PCC [Wormhole] - Wormhole reduce on last dim needs a transpose work-around - Real bug fix is tracked elsewhere #3262 #3605: ttl.tensor.std_hw ... — committed to tenstorrent/tt-metal by muthutt 7 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: remove transpose after fix : fix tests — committed to tenstorrent/tt-metal by deleted user 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
- #3178: remove transpose after fix : fix tests — committed to tenstorrent/tt-metal by deleted user 5 months ago
- #3178: Fix for wormhole b0 reduce w — committed to tenstorrent/tt-metal by rtawfik01 5 months ago
Hi @muthutt @davorchap , I got a bug fix here: 043e8c5eebb522915ed0cb25bfa5ef9615b11f68
I tested it using:
and it all passes. The issue was that for REDUCE_ROW mode, Grayskull has the transpose of SrcA register on the math thread, but wormhole B0 has the transpose of SrcA register on the unpack thread, and its configurable using a flag. I set those flags for wormhole B0.
Please let me know if you have any other issues, I can push that fix once you confirm it works on all other max reduce w tests.