roslyn: Proposal: Ref Returns and Locals

(Note: this proposal was briefly discussed in dotnet/roslyn#98, the C# design notes for Jan 21, 2015. It has not been updated based on the discussion that’s already occurred on that thread.)

Background

Since the first release of C#, the language has supported passing parameters by reference using the ‘ref’ keyword, This is built on top of direct support in the runtime for passing parameters by reference.

Problem

Interestingly, that support in the CLR is actually a more general mechanism for passing around safe references to heap memory and stack locations; that could be used to implement support for ref return values and ref locals, but C# historically has not provided any mechanism for doing this in safe code. Instead, developers that want to pass around structured blocks of memory are often forced to do so with pointers to pinned memory, which is both unsafe and often inefficient.

Solution: ref returns

The language should support the ability to declare ref locals and ref return values. We could, for example, now declare a function like the following, which not only accepts ‘ref’ parameters but which also has a ref return value:

public static ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}

With a method like that, one can now write code that passes two values by reference, with one of them being returned based on some condition:

Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right).M20 = 1.0;

Based on the function that gets passed in here, a reference to either ‘left’ or ‘right’ will be returned, and the M20 field of it will be set. Since we’re trading in references, the value contained in either ‘left’ or ‘right’ is updated, rather than a temporary copy being updated, and rather than needing to pass around big structures, necessitating big copies.

If we don’t want the returned reference to be writable, we could apply ‘readonly’ just as we were able to do earlier with ‘ref’ on parameters (extending the proposal mentioned in dotnet/roslyn#115 to also support return refs):

public static readonly ref TValue Choose<TValue>(
    Func<bool> condition, ref TValue left, ref TValue right)
{
    return condition() ? ref left : ref right;
}
…
Matrix3D left = …, right = …;
Choose(chooser, ref left, ref right) = new Matrix3D(...); // Error: returned reference is read-only

Note that when referencing the ‘left’ and ‘right’ ref arguments in the Choose method’s implementation, we used the ‘ref’ keyword. This would be required by the language, just as it’s required to use the ‘ref’ keyword when passing a value to a ‘ref’ parameter.

Solution: ref locals

Once you have the ability to receive ‘ref’ parameters and to return ‘ref’ return values, it’s very handy to be able to define ‘ref’ locals as well. A ‘ref’ local can be set to anything that’s safe to return as a ‘ref’ return, which includes references to variables on the heap, ‘ref’ parameters, ‘ref’ values returned from a call to another method where all ‘ref’ arguments to that method were safe to return, and other ‘ref’ locals.

public static ref int Max(ref int first, ref int second, ref int third)
{
    ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}
…
int a = 1, b = 2, c = 3;
Max(ref a, ref b, ref c) = 4;
Debug.Assert(a == 1); // true
Debug.Assert(b == 2); // true
Debug.Assert(c == 4); // true

We could also use ‘readonly’ with ref on locals (again, see dotnet/roslyn#115), to ensure that the ref variables don’t change. This would work not only with ref parameters, but also with ref locals and ref returns:

public static readonly ref int Max(
    readonly ref int first, readonly ref int second, readonly ref int third)
{
    readonly ref int max = first > second ? ref first : ref second;
    return max > third ? ref max : ref third;
}

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 72
  • Comments: 167 (61 by maintainers)

Most upvoted comments

If I recall, Eric Lippert blogged about this some years back and the response in the comments was largely negative.

I do not like this feature for C#. The resulting code is like an uglier version of C++, and code written with it takes longer to reason about and understand. The use-cases are not particularly compelling, and I have never run into a situation where I wished I had ref locals or return values.

Disclaimer: I work on game engine, so I am probably not the typical user.

One use case this could really help us is this one:

MyHugeStruct[] data; // we use a struct to improve data locality and reduce GC pressure
// Ideally, we would like to be able to use List<T>, but we can't take ref then
for (int i = 0; i < data.Length; ++i)
{
   // Option 1: make a local copy (slow)
   var item = data[i];

   // Option2: To avoid making a stack copy of MyHugeStruct,
   // we have to defer to a inner loop function
   MyLoopBody(ref data[i]);

   // Option3: using new proposal, that would be much better:
   ref MyHugeStruct = data[i];
}

We end up making separate function for loop body, and in case of tight loop this can end up being quite bad:

  • Have to forward all parameters
  • Sometimes we found out with VTune that inner loop stack “initlocals” was taking up most (80%+) of the time if inner loop body happened to have a several locals (even if only 0 or 1 was used due to branching). This would not happened if the locals were contained and memzeroed once in the function containing the “for” loop.
  • not inlined in simple cases

Nice to have:

  • ref this[] operator(?) so that List<> and other collections can be used (vs being forced to use arrays)
  • a ++ operator on ref to be able to loop by incrementing pointer instead of indice multiplication (but probably unsafe).

Extra (probably impossible without changing BCL):

  • Lot of struct copy could also be avoided in EqualityComparer (Dictionary) if ref could be used when large structs are being used as key.

Beautiful solution, I’ve wondered why this couldn’t be done before.

@MgSam [The resulting code is like an uglier version of C++] Because of sentiments like this (i.e. ‘anything I don’t personally use should never be part of the language for anybody else either, even though the CLR itself has this capability’), it means our language is needlessly crippled in places where a very easy and beautiful solution like this gives us such a capability. As the gamer showed in the comment above, this can be a big performance win in some cases.

Closing as this is now implemented.

This proposal would increase the complexity of the language, without a significant use case or benefit to offset that complexity. I’m not convinced.

As far as mutable structs are concerned: They are the ONLY way to get large mutable arrays with good locality, period. When it comes to performance, locality of reference often dominates all else.

@JeroMiya I have heard something like your words from so many people who was spoiled by fast computer and don’t know what new object did to memory, how the GC work or how it impact performance, and how struct could be used. People like these always make every little thing as a class, generate so many little garbage in memory, and boxing thing with ignorance

@jaredpar that’s very saddening, i just have to pray 99% of people don’t touch this feature.

The strange unicode quotes aren’t valid syntax, but allowing ref returns to be lvalues is kind of exactly the point here. From the first comment:

int a = 1, b = 2, c = 3;
Max(ref a, ref b, ref c) = 4;
Debug.Assert(a == 1); // true
Debug.Assert(b == 2); // true
Debug.Assert(c == 4); // true

If I were writing say a video game, or some other set of algorithms that required I went out of my way to avoid GC (such as a web server) being able to write code like this can be very useful.

We aren’t looking to make this kind of thing pretty, merely for it to be better performing than what you have to currently do (either add another method override to do something and the IL instructions to add the new parameters, or mess around with pointers in C# unsafe code, or write the CIL yourself without the help of the C# language).

👍

Anytime I can pass a pointer instead of performing a value copy, I’m all for it. Are there good reasons to pass memory by value-copy? Yes. Should it always be the case? Absolutely not.

The resulting code is like an uglier version of C++

I agree, it is not pretty but it is very descriptive. It would be nice if the ref keyword could be replaced with syntax we’re all used to. Perhaps we could use * in place of ref because int* foo; is “cleaner” and “easier” to read than ref int foo;. I put “cleaner” and “easier” in quotes because it is incredibly subjective.

Yes, I know that * is generally reserved for unsafe but there’s no reason the symbol cannot be reused, so long as one is reserved for a “safe” contexts and the other for an “unsafe” context.

@dotnetchris yes. it should even allow

foo.GetByRef("x") = foo.GetbyRef("x")

@jaredpar Ah, sorry, may be I have not been enough clear. I’m not proposing the idea to re-pointer the ref (though, I have never had a need for this, but hey, the idea could grow on me 😄 ) , but to disallow the variable (and the struct behind of course) to be re-assigned entirely.

Let me take an example for a readonly ref scenario:

struct MyStruct
{
   public readonly int X;
   public int Y; 
}

public void Process(readonly ref MyStruct val)
{
   // This would not compile
   // In this case, we also disallow the field X to be modified
   // while with a regular ref, we could modify it indirectly with the following code
   val = new MyStruct();  
   // We cannot do this
   val.X++;
   // But we can do this:
   val.Y++;
   ....
}

It allows typically to protect the variable + protect readonly fields behind, which is a nice behavior as It allows partial immutability of a ref struct. If the caller of the method is passing this struct, It can ensure that the callee will not be able to modify its readonly fields (or even private ones).

On the other hand ref readonly would allow to pass a readonly field or variable to another method:

class MyClass
{
     public static readonly MyStruct MyField;
}

public static void Process(ref readonly MyStruct val)
{
    // We cannot do this:
    val = new MyStruct();
    // And also we cannot do this:
    val.Y++;
}

Process(ref MyClass.MyField); // It would be possible

Hope it makes more sense 😅

I agree. But for it to be useful you need to take it one step further.

Yes definitely a different Issue; readonly structs are problematic. I see https://github.com/dotnet/roslyn/issues/115 main, addtional https://github.com/dotnet/csharplang/issues/6628 https://github.com/dotnet/roslyn/issues/3202

ref returns and locals as they stand are an amazing addition! Thank you very much!

@bbarry Because what we want to return maybe valuetype The async may do the work of loading and initialize large struct. And after it finish it will just return that struct. So the struct may alive in something on heap but we don’t want to let client have access to that class. We want to return that value but as reference for performance reason

struct UserData { /* Very Large detailed information */ }

class UserManager
{
    UserData[] array;
    public static async ref UserData LoadOrGet(int n)
    {
        // First find value in array, return ref array[n] if exist
        // await for load from server, set to array[n + 1] and return after loaded
    }
}

I think some people here have too many enthusiastic on immutability. But it defeat the purpose of this request in the first place, which is performance to control value of struct (performance is reason why we have struct in the first place)

If you want your struct to be immutable you should explicitly use readonly or const

Also I would suggest that in addition to ref return and ref local, It would be useful if we have const return, local and at parameter

int[] numbers = new int[]{0,1,2};
public const int ConstValue(const int index)
{
    return const numbers[index]
}
public ref int RefValue(const int index)
{
    return ref numbers[index]
}

And const parameter will pass by ref but cannot write any code to modified it, Which is useful if we have large immutable struct

I’m testing ref locals and I’ve encountered following limitation. I declare variable ref int as reference to first element in array. How can I change where this variable points? Let’s say I want to change it to point to last element, but it’s not possible? (see my attempts below)

static int[] data = new int[] { 0, 1, 2, 3, 4 };
unsafe static void Main(string[] args)
{
    ref int slot = ref data[0];

    slot = data[4]; // This stores value "4" into data[0], I don't want that

    // This does not work
    //ref int slot = ref data[4]; // This would be it, except variable is already declared
    //slot = ref data[4];
    //ref slot = ref data[4];

    slot = 99; // When it works, this would overwrite last element

    foreach(var item in data)
    {
        Console.WriteLine(item);
    }
    Console.ReadKey();
}

I thought I’ll try to compare performance of this approach in my tree-like collection implemented on array. Currently when looking for element to add/remove, I have locals like int currentIndex, int parentIndex. With this I thought I would use ref Node current, ref Node parent, but when it’s not possible to modify current in while loop, it won’t work.

@xoofx

The ability to assign to a struct location and call methods without a copy are equivalent operations. Adding protection for one without protection for the other is just lulling developers into a false sense of confidence about their code.

This all has to do with how this is modeled. In a struct the type of this is ref T. Hence whenever you call a method on a struct the target must be convertible to ref T. That is why it’s wrong from a language correctness standpoint to allow readony ref to call a method without a copy. It’s implying there is a conversion between readonly ref T and ref T.

+1 to readonly ref!

When you deal with large struct and want to avoid copies (think Matrix), ref makes a lot of sense. And of course, we want to have predefined values as static readonly (i.e. Matrix.Identity).

The problem is we can’t use any of the Matrix methods that take a ref with those static readonly (i.e. Matrix.Multiply(ref Matrix.Identity, ref matrix2)). The only way is to make a full copy beforehand, or getting rid of the readonly (bringing lot of safety issues).

@benaadams

However a readonly type extension to both ref arguments and ref locals and returns probably would also be a useful addition (

I agree. But for it to be useful you need to take it one step further. Consider for example this code:

void M(readonly ref BigStruct s)
{
  Console.WriteLine(s.ToString());
}

In this case the argument is taken by ref presumably to avoid copying a large struct. However in order to execute the ToString call the compiler will fully copy the value to the stack. Oops 😦

This is the behavior of C# when you call a struct method on a readonly location. Without a copy it would be possible for the stuct to violate readonly by modifying it’s state within the method.

This logic doesn’t just apply to methods, but to properties as well. Hence passing a struct by readonly ref is only advantageous compared to passing by value if you read fields off of it. Any use of method or properties and you’re better off passing it the standard way.

In order to get around this we need to be able to mark struct methods in such a way that the compiler knows they aren’t mutating. That way it can invoke the method directly vs. having to go through a copy on the stack.

There are two proposals for how to do that:

  • readonly structs: ability to tag an entire struct as readonly. For such structs the type of this in non-constructor members would be readonly ref T instead of ref T.
  • readonly members on structs: ability to tag a struct member as readonly. For that member the type of this would be readonly ref T.

Is it correct to understand that only references to local variables are not safe to return? And if so, why not just insert a runtime check in the caller that throws an exception if the returned reference points to something that lies in the discarded stack frame? That shouldn’t be too much of a performance hit, and it can be optimized away if the compiler can prove that the returned reference is safe.

@HaloFour Would it help if I gave you the equivalent C++ syntax?

Here: (&values[i]) = initData;

I guarantee, this works exactly as I’ve actually stated.

And no, you don’t need an array of refs to get a reference to a location in an array.

Correct, that would not compile.