torchdynamo: [aot_eager] Accuracy error - BigBird

Repro - python benchmarks/huggingface.py --accuracy --float32 --backend=aot_eager --training --only=BigBird

My investigation

I ran accuracy minifier. Accuracy minifier takes the extracted Fx subgraph, runs that eagerly, and then with compiler_fn (aot_eager here) and compares the outputs. In this case, accuracy minifier did not find any offending subgraph.
I recheck eager backend – torchdynamo.optimize("eager"). It passed. So, something is wrong with aot_eager backend.
At this point, I am thinking the problem must be in interstitial code between Fx subgraphs in Python bytecode. I thought that BigBird has numpy random calls, so maybe we are changing numpy rng state. But that was a dead end too as we reset numpy rng state before each run. eager backend is passing too, so interstitial code is ok.
So, I just started arbitrarily falling back to eager here by adjusting the number of ops in Fx graph

https://github.com/pytorch/torchdynamo/blob/d6e2101148eb561aa8658765e0de1801e89f22eb/torchdynamo/optimizations/training.py#L32-L33

After lots of hit and trial, I found that the this diff worked

diff --git a/torchdynamo/optimizations/training.py b/torchdynamo/optimizations/training.py
index 7f64c86c..b580180e 100644
--- a/torchdynamo/optimizations/training.py
+++ b/torchdynamo/optimizations/training.py
@@ -29,7 +29,8 @@ class AotAutogradStrategy(object):

     @classmethod
     def compile_fn(cls, gm: torch.fx.GraphModule, example_inputs):
-        if count_calls(gm.graph) < 2:
+        if count_calls(gm.graph) <= 2:
+            log.error(f"{gm}")
             return gm.forward  # no point for tiny graphs
         return cls(gm, example_inputs).verified_candidate()

And the additional skipped module is



def forward(self, _stack0_0_ : torch.Tensor):
    contiguous = _stack0_0_.contiguous();  _stack0_0_ = None
    view = contiguous.view(1, 1024, -1);  contiguous = None
    return (view,)

So, I have two questions

What is the issue? I don’t understand why skipping the above subgraph makes it work.
Why accuracy minifier failed here?

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (16 by maintainers)

Commits related to this issue

Copy over non parameter grad (#85658) Wow, ugh silly mistake. Fix for https://github.com/pytorch/torchdynamo/issues/1291 not even sure how all the tests passed before this. Pull Request resolved: ht... — committed to pytorch/pytorch by eellison 2 years ago
Copy over non parameter grad (#85658) Wow, ugh silly mistake. Fix for https://github.com/pytorch/torchdynamo/issues/1291 not even sure how all the tests passed before this. Pull Request resolved: ht... — committed to pytorch/pytorch by eellison 2 years ago
Copy over non parameter grad (#85658) Wow, ugh silly mistake. Fix for https://github.com/pytorch/torchdynamo/issues/1291 not even sure how all the tests passed before this. Pull Request resolved: ht... — committed to alvgaona/pytorch by eellison 2 years ago

Most upvoted comments

Note the above graph is without functionalization, and the inaccuracy is pretty minimal - torchdynamo.utils: [ERROR] RMSE (res-fp64): 0.01575, (ref-fp64): 0.00002

This inaccuracy also seems pretty high to me. I would imagine ref and ref to be very different here.

anijain2305 on Sep 21, 2022

In this model, there are many graph breaks. So there are multiple of tens of subgraphs.

For each subgraph, minifier compares eager vs dynamo accuracy. If it fails, it dumps the offending subgraph and starts minifying it further (dumping at every success of minification).

The issue here is that accuracy minifier does not find any subgraph offending. (even though the final accuracy fails)

anijain2305 on Sep 21, 2022