torchdynamo: [aot_eager] Accuracy error - BigBird

Repro - python benchmarks/huggingface.py --accuracy --float32 --backend=aot_eager --training --only=BigBird

My investigation

  • I ran accuracy minifier. Accuracy minifier takes the extracted Fx subgraph, runs that eagerly, and then with compiler_fn (aot_eager here) and compares the outputs. In this case, accuracy minifier did not find any offending subgraph.

  • I recheck eager backend – torchdynamo.optimize("eager"). It passed. So, something is wrong with aot_eager backend.

  • At this point, I am thinking the problem must be in interstitial code between Fx subgraphs in Python bytecode. I thought that BigBird has numpy random calls, so maybe we are changing numpy rng state. But that was a dead end too as we reset numpy rng state before each run. eager backend is passing too, so interstitial code is ok.

  • So, I just started arbitrarily falling back to eager here by adjusting the number of ops in Fx graph

https://github.com/pytorch/torchdynamo/blob/d6e2101148eb561aa8658765e0de1801e89f22eb/torchdynamo/optimizations/training.py#L32-L33

  • After lots of hit and trial, I found that the this diff worked
diff --git a/torchdynamo/optimizations/training.py b/torchdynamo/optimizations/training.py
index 7f64c86c..b580180e 100644
--- a/torchdynamo/optimizations/training.py
+++ b/torchdynamo/optimizations/training.py
@@ -29,7 +29,8 @@ class AotAutogradStrategy(object):

     @classmethod
     def compile_fn(cls, gm: torch.fx.GraphModule, example_inputs):
-        if count_calls(gm.graph) < 2:
+        if count_calls(gm.graph) <= 2:
+            log.error(f"{gm}")
             return gm.forward  # no point for tiny graphs
         return cls(gm, example_inputs).verified_candidate()

And the additional skipped module is



def forward(self, _stack0_0_ : torch.Tensor):
    contiguous = _stack0_0_.contiguous();  _stack0_0_ = None
    view = contiguous.view(1, 1024, -1);  contiguous = None
    return (view,)

So, I have two questions

  • What is the issue? I don’t understand why skipping the above subgraph makes it work.
  • Why accuracy minifier failed here?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Note the above graph is without functionalization, and the inaccuracy is pretty minimal - torchdynamo.utils: [ERROR] RMSE (res-fp64): 0.01575, (ref-fp64): 0.00002

This inaccuracy also seems pretty high to me. I would imagine ref and ref to be very different here.

In this model, there are many graph breaks. So there are multiple of tens of subgraphs.

For each subgraph, minifier compares eager vs dynamo accuracy. If it fails, it dumps the offending subgraph and starts minifying it further (dumping at every success of minification).

The issue here is that accuracy minifier does not find any subgraph offending. (even though the final accuracy fails)