runtime: CompileDeepTree_NoStackOverflowFast fails StackOverflowException with debug build

@seanshpark commented on Thu Apr 20 2017

CoreFX System.Linq.Expressions.Tests teminates with StackOverflowException

Discovering: System.Linq.Expressions.Tests
Discovered:  System.Linq.Expressions.Tests
Starting:    System.Linq.Expressions.Tests
Process is terminating due to StackOverflowException.
Aborted (core dumped)

Inside, CompileDeepTree_NoStackOverflowFast test, it tests the compiler with 100 constant expression with 128KB stack size. I’m running the test with debug version of corerun (release is not stable yet). When I change the stack size to 512K, this method passes.

I’m not sure how to think about this problem. Should I test this with Release version only or is there any other ways to fix this?

Test code in CoreFX

[Theory, ClassData(typeof(CompilationTypes))]
public static void CompileDeepTree_NoStackOverflowFast(bool useInterpreter)
{
    Expression e = Expression.Constant(0);

    int n = 100;

    for (int i = 0; i < n; i++)
        e = Expression.Add(e, Expression.Constant(1));

    Func<int> f = null;
    // Request a stack size of 128KiB to get very small stack.
    // This reduces the size of tree needed to risk a stack overflow.
    // This though will only risk overflow once, so the outerloop test
    // above is still needed.
    Thread t = new Thread(() => f = Expression.Lambda<Func<int>>(e).Compile(useInterpreter), 128 * 1024);
    t.Start();
    t.Join();

    Assert.Equal(n, f());
}

@seanshpark commented on Thu Apr 20 2017

@parjong, do you have any comments to add?


@seanshpark commented on Thu Apr 20 2017

@jkotas , could you please help or invite who could be a help?


@parjong commented on Thu Apr 20 2017

I do not have much to add.


@seanshpark commented on Thu Apr 20 2017

With a same method(without the Assert.Equal(n, f()); line) run in a console app, gdb stack shows like this. this=0x0 in Compiler::compStressCompile as seems that stack has ran out.

(gdb) bt
#0  0xb17e3ab8 in Compiler::compStressCompile (this=0x0, stressArea=Compiler::STRESS_NONE, weight=0) at /home/maxwell/netcore/coreclr/src/jit/compiler.cpp:3529
dotnet/corefx#1  0xb19879ec in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba2a8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14681
dotnet/corefx#2  0xb199c320 in Compiler::fgMorphSmpOp (this=0x80b3a20, tree=0x80ba368, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:11434
dotnet/corefx#3  0xb1987d61 in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80ba368, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14767
dotnet/corefx#4  0xb199c320 in Compiler::fgMorphSmpOp (this=0x80b3a20, tree=0x80ba428, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:11434
...
dotnet/corefx#61 0xb1987d61 in Compiler::fgMorphTree (this=0x80b3a20, tree=0x80bb8c8, mac=0x0) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:14767
dotnet/runtime#13867 0xb19b168d in Compiler::fgMorphStmts (this=0x80b3a20, block=0x80b6c34, mult=0xb4460227, lnot=0xb4460226, loadw=0xb4460225) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:15579
dotnet/runtime#13868 0xb19b2388 in Compiler::fgMorphBlocks (this=0x80b3a20) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:15864
dotnet/corefx#64 0xb19b612f in Compiler::fgMorph (this=0x80b3a20) at /home/maxwell/netcore/coreclr/src/jit/morph.cpp:17062
dotnet/corefx#65 0xb17e8769 in Compiler::compCompile (this=0x80b3a20, methodCodePtr=0xb4460b0c, methodCodeSize=0xb4460f20, compileFlags=0xb4460b20) at /home/maxwell/netcore/coreclr/src/jit/compiler.cpp:4299
...
dotnet/corefx#75 0xb6cc4036 in CallCompileMethodWithSEHWrapper(EEJitManager*, CEEInfo*, CORINFO_METHOD_INFO*, CORJIT_FLAGS, unsigned char**, unsigned int*, MethodDesc*)::$_4::operator()(CallCompileMethodWithSEHWrapper(EEJitManager*, CEEInfo*, CORINFO_METHOD_INFO*, CORJIT_FLAGS, unsigned char**, unsigned int*, MethodDesc*)::Param*) const (this=0xb4460d10, pParam=0xb4460d18) at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12126
dotnet/corefx#76 0xb6cc3ea9 in CallCompileMethodWithSEHWrapper (jitMgr=0x8084cb8, comp=0xb4461050, info=0xb4460fa0, flags=..., nativeEntry=0xb4460f24, nativeSizeOfCode=0xb4460f20, ftn=0xb175a1a0)
    at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12169
dotnet/runtime#13873 0xb6cc608e in UnsafeJitFunction (ftn=0xb175a1a0, ILHeader=0x0, flags=..., pSizeOfCode=0xb4461f44) at /home/maxwell/netcore/coreclr/src/vm/jitinterface.cpp:12813
dotnet/corefx#78 0xb720ff41 in MethodDesc::MakeJitWorker (this=0xb175a1a0, ILHeader=0x0, flags=...) at /home/maxwell/netcore/coreclr/src/vm/prestub.cpp:490
dotnet/corefx#79 0xb7214505 in MethodDesc::DoPrestub (this=0xb175a1a0, pDispatchingMT=0x0) at /home/maxwell/netcore/coreclr/src/vm/prestub.cpp:1515
dotnet/corefx#80 0xb6fd2d75 in ReflectionInvocation::CompileMethod (pMD=0xb175a1a0) at /home/maxwell/netcore/coreclr/src/vm/reflectioninvocation.cpp:2006
dotnet/corefx#81 0xb1778a84 in ?? ()
...
...

@jkotas commented on Fri Apr 21 2017

This test was added by @JonHanna in https://github.com/dotnet/corefx/pull/14444


@JonHanna commented on Fri Apr 21 2017

Has something changed in TryEnsureSufficientExecutionStack in debug that maybe the test is catching, or is this a test configuration that is new?

When I change the stack size to 512K, this method passes

From https://github.com/dotnet/corefx/issues/17550#issuecomment-289816350 it would seem that it’ll pass with 512K because it won’t test what it’s trying to test. What happens with 192K?

We could up both the stacksize and the size of the tree compiled to keep it testing what it is there to test without, though that increases execution time and the point is to do a similar job to the slower CompileDeepTree_NoStackOverflow without spending so long that it impacts heavily on how long the test suite takes. After a certain point the test no longer earns its keep, especially as there is that slower test being run on outerloop.


@parjong commented on Fri Apr 21 2017

@JonHanna Yes, this issue comes from a completely new configuration (x86/Linux Debug). CLR has been ported to x86/Linux. Recent CLR unittest result shows that debug build is stable, and thus we attempted to run FX unittest as the next step (checked/release build is under bring up).


@JonHanna commented on Fri Apr 21 2017

It would be worth temporarily taking the OuterLoop tag off this test’s outerloop cousin and seeing if it passes. If that one also fails then the problem is in TryEnsureSufficientExecutionStack on that configuration and the tests have found a bug. If that one fails then the innerloop test needs tweaking or disabling.


@parjong commented on Fri Apr 21 2017

@JonHanna We could run System.Linq.Expressions.Tests without stack overflow if this case is excluded (although there are several failures).


@JonHanna commented on Fri Apr 21 2017

If CompileDeepTree_NoStackOverflow passes (but note that it’s outerloop and won’t be included in most runs, I’m beginning to think that this test is more trouble than it’s worth; anything it catches will still be caught by the outerloop tests, albeit not on most test runs.


@janvorli commented on Fri Apr 21 2017

Taking a look at how the safe execution stack limit is determined, I can see that it checks for 128kB of remaining stack space on 64 bit architectures, but only for 64kB on 32 bit architectures. Could that be causing the problem?


@parjong commented on Fri Apr 21 2017

@JonHanna Oops. I mean all the innerloop tests. Sorry for confusion. I will let you know the result when the result is ready (I guess that I could let you know the result in the next Monday).


@seanshpark commented on Sun Apr 23 2017

it would seem that it’ll pass with 512K because it won’t test what it’s trying to test. What happens with 192K?

Thank you for the explanation. 192K also fails and 384K also but didn’t check the number that was working. We’ll check the cousin in the outter loop.


@parjong commented on Sun Apr 23 2017

@JonHanna It turns out that the corresponding outerloop test also fails with the same reason (stack overflow) if debug binary is used. This issue does not happen for checked binary.

Is it possible to add some trait that allows us to exclude this test for debug build? We could not use checked binary as checked binary is not stable yet.


@seanshpark commented on Sun Apr 23 2017

I’ve also tested and the results are

  • Debug CoreCLR + CompileDeepTree_NoStackOverflow : Pass
  • Checked CoreCLR + CompileDeepTree_NoStackOverflow : Pass
  • Debug CoreCLR + CompileDeepTree_NoStackOverflowFast : Fail
  • Checked CoreCLR + CompileDeepTree_NoStackOverflowFast : Pass

@JonHanna commented on Mon Apr 24 2017

It turns out that the corresponding outerloop test also fails with the same reason (stack overflow) if debug binary is used.

If the outerloop test is failing, I’m rather worried that it might be an actual bug in TryEnsureSufficientExecutionStack or that it isn’t sensitive enough an we can expect other uses of it to have a similar problem. Though @seanshpark found differently to you, which is curious in itself.

Is it possible to add some trait that allows us to exclude this test for debug build?

We could of course wrap it in #if !DEBUG, but someone else may know of a trait that’s applicable.


@parjong commented on Mon Apr 24 2017

@JonHanna Could you let me know why this test uses 128kB as a limit? As @janvorli mentioned in https://github.com/dotnet/coreclr/issues/11122#issuecomment-296169837, TryEnsureSufficientExecutionStack returns true if the remaining stack size is 64kB for 32-bit architecture instead of 128kB.


@seanshpark commented on Mon Apr 24 2017

Sorry for the wrong report. I’v retested with debug version from latest master and both fails.

  • Debug CoreCLR + CompileDeepTree_NoStackOverflow : Fail
  • Debug CoreCLR + CompileDeepTree_NoStackOverflowFast : Fail

@JonHanna commented on Mon Apr 24 2017

Well the idea of the “fast” one is to have a small stack so that it quickly risks stack overflow, but TryEnsureSufficientExecutionStack saves the day and it doesn’t happen. Clearly not the actual result. The “slow” one does the same thing with a more normal stack. If only the fast failed I might give it up as overly artificial, but the slow is triggering conditions that were causing users problems with real code, so I’m much more worried about that. Is there a how-to on the type of build you are doing so I can take a look myself?

CC @bartdesmet as the first author of the stackguard code and test.


@parjong commented on Mon Apr 24 2017

@JonHanna https://github.com/dotnet/coreclr/issues/9265#issuecomment-280521257 provides a brief step-by-step build instruction (you first need to create a root filesystem via running build-rootfs.sh under cross with x86 option).

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

We do not have a good reliable way to detect that you are running on debug runtime or debug build of the JIT. It would be really useful if all inner loop corefx tests work on debug builds of the runtime or JIT, on all platforms.

If this is a corner case test that does not work on debug runtime, I would suggest moving it to outer loop.