runtime: Tail Call bug in RyuJIT - Incorrect parameters passed

The following critical bug is known and fixed internally (not released) by Microsoft (per private discussions while security was evaluated). In this bug “we” refers to Marc Gravell (@mgravell) and I who tracked down the issue on the Stack Overflow side on the fence. I am posting it here for public visibility and to help users hitting the same issue. I have also created a blog post addressing discovery and severity here: Why you should wait on upgrading to .Net 4.6.

There is an issue in the .Net 4.6 RTM RyuJIT implementation that incorrectly hooks up parameters during some tail call optimizations. The net result of this is that tail call methods can get the wrong parameters passed. Here’s our code exhibiting the issue:

void Set<T>(string key, T val, int? durationSecs, bool sliding, bool broadcastRemoveFromCache = false)
{
    SetWithPriority<T>(key, val, durationSecs, sliding, CacheItemPriority.Default);
}

void SetWithPriority<T>(string key, T val, int? durationSecs, bool isSliding, CacheItemPriority priority)
{
    key = KeyInContext(key);

    RawSet(key, val, durationSecs, isSliding, priority);
}

void RawSet(string cacheKey, object val, int? durationSecs, bool isSliding, CacheItemPriority priority)
{
    var absolute = !isSliding && durationSecs.HasValue 
                   ? DateTime.UtcNow.AddSeconds(durationSecs.Value) 
                   : Cache.NoAbsoluteExpiration;
    var sliding = isSliding && durationSecs.HasValue 
                  ? TimeSpan.FromSeconds(durationSecs.Value) 
                  : Cache.NoSlidingExpiration;

    HttpRuntime.Cache.Insert(cacheKey, val, null, absolute, sliding, priority, null);
}

The RawSet() is the last in the chain, and only when optimizations are enabled, is incorrectly optimized. If for example we call Set<T> with a durationSecs of 3600, we would expect that value to go all the way down the method chain. This doesn’t happen. Instead, the value passed to the tail method (RawSet) is seemingly random (our assumption is it pulls some other value from the stack). For example, it may be 30, or 57, or null.

The net result for us is that items are getting cached for drastically shorter durations, or not at all (null often gets passed). On Stack Overflow this causes unpredictable local HTTP cache usage and more hits and heavier load to anything populating it. This is a production blocker on deployment for us, and (we believe) should be for anyone else.

We have collapsed the reproduction of this bug into a project you can run locally, available on GitHub. Here’s an example test run (see the repo README for a full description):

Results Window

I can’t stress how serious of a bug this is, due to the subtleness of the occurrence, the “only production” likelihood given the RELEASE-only nature, and the scary examples we can easily come up with. When the parameters you’re passing aren’t the ones the method is getting, all sanity goes out the window.

What if your method says how much stock to buy? What if it gives dosing information to a patient? What if it tells a plane what altitude to climb to?

This bug is critical, I am posting the issue for several reasons:

  • We’re not jerks
  • To raise awareness
  • To prevent as many people as possible from deploying affected code
  • To usher a fix and deployment from Microsoft as soon as possible

If you have already deployed .Net 4.6, our recommendation at this point is to disable RyuJIT immediately.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 63 (37 by maintainers)

Most upvoted comments

Not being limited to the memory space of the process means this could introduce instability into systems running 4.6, and break the process isolation of most shared hosting environments.

It will not break process isolation as Ayende said, because reading from another process like that is not allowed in Windows. However, applications in the same application pool can affect each other since an app domain in an application pool cannot provide isolation for this, I think.

There is no way for the process to be just randomly reading some other process memory. You need ReadMemory() calls to do that, and those don’t just happen, even with this issue.

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Tue, Jul 28, 2015 at 4:00 PM, Tim Rayburn notifications@github.com wrote:

For those who have been deep into the internals for this bug, can anyone confirm that the random memory address that is being read to get the “incorrect” value is 100% Absolutely Without A Doubt coming from within the memory space of the .NET Process? If not, as horrible as this is already, this becomes much worse as an angle of attack to read and/or write random memory locations on boxes running .NET Framework 4.6.

Not being limited to the memory space of the process means this could introduce instability into systems running 4.6, and break the process isolation of most shared hosting environments.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/coreclr/issues/1296#issuecomment-125597778.

For those who have been deep into the internals for this bug, can anyone confirm that the random memory address that is being read to get the “incorrect” value is 100% Absolutely Without A Doubt coming from within the memory space of the .NET Process? If not, as horrible as this is already, this becomes much worse as an angle of attack to read and/or write random memory locations on boxes running .NET Framework 4.6.

Not being limited to the memory space of the process means this could introduce instability into systems running 4.6, and break the process isolation of most shared hosting environments.

I have found some troubles with asm code. Part 1. screen1 The call of OnLogDuration from SetWithPriority:

mov         r8,5BFFFCC3E8h
mov         r8,qword ptr [r8]        ; "LocalCache.SetWithPriority"
mov         rcx,rsi                  ; key
mov         rdx,qword ptr [rsp+70h]  ; durationSecs
call        00007FFA7292EF80         ; OnLogDuration

We can see that durationSecs stores at the address [rsp+70h]. So far, everything looks fine.

Part 2. screen3 The call of OnLogDuration from RawSet:

mov         r8,5BFFFCC3F0h  
mov         r8,qword ptr [r8]        ; "RawSet"
mov         rcx,rsi                  ; key
mov         rdx,qword ptr [rsp+0C8h] ; durationSecs
call        00007FFA7292EF80         ; OnLogDuration

We can see that RyuJIT tries to read durationSecs from [rsp+0C8h]. But:

RSP                  = 0x0000005787AAD6C0
durationSecs address = 0x0000005787AAD790
durationSecs offset  = 0x00000000000000D0

As we can see, the calculated durationSecs offset is 0C8h, but the actual offset is 0D0h. In this case, the TCO calculation error is 8 bytes.