Ref returns restrictions in C# 7.0

2020-06-08 03:08发布

问题:

I am trying to understand the following excerpt from an official blog post about new features in C# 7.0 concerned with ref returns.

  1. You can only return refs that are “safe to return”: Ones that were passed to you, and ones that point into fields in objects.

  2. Ref locals are initialized to a certain storage location, and cannot be mutated to point to another.

Unfortunately, the blog post does not give any code example. Would greatly appreciate it if someone could shed more light into the restrictions highlighted in bold with practical examples and an explanation.

Thanks in advance.

回答1:

You've got some answers that clarify the restriction, but not the reasoning behind the restriction.

The reasoning behind the restriction is that we must never allow an alias to a dead variable. If you have an ordinary local in an ordinary method, and you return a ref to it, then the local is dead by the time the ref is used.

Now, one might point out that a local that is returned by ref could be hoisted to a field of a closure class. Yes, that would solve the problem. But the point of the feature is to allow developers to write high-performance close-to-the-machine low-cost mechanisms, and automatically hoisting to a closure -- and then taking on the burdens of collection pressure and so on -- work against that goal.

Things can get slightly tricky. Consider:

ref int X(ref int y) { return ref y; }
ref int Z( )
{
  int z = 123;
  return ref X(ref z);
}

Here we are returning a ref to local z in a sneaky manner! This also has to be illegal. But now consider this:

ref double X(ref int y) { return ref whatever; }
ref double Z( )
{
  int z = 123;
  return ref X(ref z);
}

Now we can know that the returned ref is not the ref to z. So can we say that this is legal if the types of the refs passed in are all different than the types of the refs returned?

What about this?

struct S { public int s; }
ref int X(ref S y) { return ref y.s; }
ref int Z( )
{
  S z = default(S);
  return ref X(ref z);
}

Now once again we have returned a ref to a dead variable.

When we designed this feature for the first time (in 2010 IIRC) there were a number of complicated proposals to deal with these situations, but my favourite proposal was simply "make all of them illegal". You don't get to return a reference that you got returned by a ref-returning method, even if there is no way it could be dead.

I don't know what rule the C# 7 team ended up implementing.



回答2:

To pass something by reference, it must be classified as variable. C# specification (§5 Variables) define seven categories of variables: static variables, instance variables, array elements, value parameters, reference parameters, output parameters and local variables.

class ClassName {
    public static int StaticField;
    public int InstanceField;
}
void Method(ref int i) { }
void Test1(int valueParameter, ref int referenceParameter, out int outParameter) {
    ClassName instance = new ClassName();
    int[] array = new int[1];
    outParameter=0;
    int localVariable = 0;
    Method(ref ClassName.StaticField);  //Static variable
    Method(ref instance.InstanceField); //Instance variable
    Method(ref array[0]);               //Array element
    Method(ref valueParameter);         //Value parameter
    Method(ref referenceParameter);     //Reference parameter
    Method(ref outParameter);           //Output parameter
    Method(ref localVariable);          //Local variable
}

The first point actually saying that you can ref return variables classified as reference parameters, output parameters, static variables and instance variables.

ref int Test2(int valueParameter, ref int referenceParameter, out int outParameter) {
    ClassName instance = new ClassName();
    int[] array = new int[1];
    outParameter=0;
    int localVariable = 0;
    return ref ClassName.StaticField;  //OK, "ones that point into fields in objects"
    return ref instance.InstanceField; //OK, "ones that point into fields in objects"
    return ref array[0];               //OK, array elements are also "safe to return" by reference
    return ref valueParameter;         //Error
    return ref referenceParameter;     //OK, "ones that were passed to you"
    return ref outParameter;           //OK, "ones that were passed to you"
    return ref localVariable;          //Error
}

Note that for instance fields of value types, you should consider "safe to return" status of enclosing variable. It is not always allowed, as in case for instance fields of reference types:

struct StructName {
    public int InstacneField;
}
ref int Test3() {
    StructName[] array = new StructName[1];
    StructName localVariable = new StructName();
    return ref array[0].InstacneField;      //OK, array[0] is "safe to return"
    return ref localVariable.InstacneField; //Error, localVariable is not "safe to return"
}

Result of ref return method considered "safe to return", if this method does not take any not "safe to return" arguments:

ref int ReturnFirst(ref int i, ref int ignore) => ref i;
ref int Test4() {
    int[] array = new int[1];
    int localVariable = 0;
    return ref ReturnFirst(ref array[0], ref array[0]);      //OK, array[0] is "safe to return"
    return ref ReturnFirst(ref array[0], ref localVariable); //Error, localVariable is not "safe to return"
}

Although we know that ReturnFirst(ref array[0], ref localVariable) will return "safe to return" reference (ref array[0]), compiler can not infer it by analyzing Test4 method in isolation. So, result of ReturnFirst method in that case considered as not "safe to return".

The second point says, that ref local variables declaration must include initializer:

int localVariable = 0;
ref int refLocal1;                     //Error, no initializer
ref int refLocal2 = ref localVariable; //OK

Also, ref local variable can not be reassigned to point to other storage location:

int localVariable1 = 0;
int localVariable2 = 0;
ref int refLocal = ref localVariable1;
ref refLocal = ref localVariable2;     //Error
refLocal = ref localVariable2;         //Error

Actually there is no valid syntax to reassign ref local variable.



回答3:

You can find a great discussion about this feature at GitHub - Proposal: Ref Returns and Locals.

1. You can only return refs that are “safe to return”: Ones that were passed to you, and ones that point into fields in objects.

The following example shows the return of a safe reference because it cames from the caller:

public static ref TValue Choose<TValue>(ref TValue val)
{
    return ref val;
}

Conversely, a non-safe version of this example would be returning a reference to a local (this code would not compile):

public static ref TValue Choose<TValue>()
{
    TValue val = default(TValue);
    return ref val;
}

2. Ref locals are initialized to a certain storage location, and cannot be mutated to point to another.

The restriction means you need to initialize a local reference always at declaration. A declaration like

ref double aReference;

would not compile. You also are not possible to assign an new reference to an already existing reference like

aReference = ref anOtherValue;


回答4:

The other answers on this page are complete and useful, but I wanted to add an additional point, which is that out parameters, which your function is required to fully initialize, count as "safe to return" for the purposes of ref return.

Interestingly, combining this fact with another new C# 7 feature, inline declaration of 'out' variables, allows for the simulation of a general-purpose inline declaration of local variables capability:

helper function:

public static class _myglobals
{
    /// <summary> Helper function for declaring local variables inline. </summary>
    public static ref T local<T>(out T t)
    {
        t = default(T);
        return ref t;
    }
};

With this helper, the caller specifies the initialization of the "inline local variable" by assigning to the ref-return of the helper.

To demonstrate the helper, next is an example of a simple two-level comparison function which would be typical for an (e.g.) MyObj.IComparable<MyObj>.Compare implementation. Although very simple, this type of expression can't get around needing a single local variable--without duplicating work, that is. Now normally, needing a local would block using an expression-bodied member which is what we'd like to do here, but the problem is easily solved with the above helper:

public int CompareTo(MyObj x) =>
                       (local(out int d) = offs - x.offs) == 0 ? size - x.size : d;

Walkthrough: Local variable d is "inline-declared," and initialized with the result of computing the first-level compare, based on the offs fields. If this result is inconclusive, we fall back to returning a second level sort (based on the size fields). But in the alternative, we do still have the first-level result available to return, since it was saved in local d.

Note that the above can also be done without the helper function, via C# 7 pattern matching:

public int CompareTo(MyObj other) => 
                       (offs - x.offs) is int d && d == 0 ? size - x.size : d;

include at the top of your source files:

using System;
using /* etc... */
using System.Xml;
using Microsoft.Win32;

using static _myglobals;    //  <-- puts function 'local(...)' into global name scope

namespace MyNamespace
{
   // ...

The following examples show declaring a local variable inline with its initialization in C# 7. If initialization is not provided, it obtains default(T), as assigned by the local<T>(out T t) helper function. This is only now possible with the ref return feature, since ref return methods are the only methods can be used as an ℓ-value.

example 1:

var s = "abc" + (local(out int i) = 2) + "xyz";   //   <-- inline declaration of local 'i'
i++;
Console.WriteLine(s + i);   //   -->  abc2xyz3

example 2:

if ((local(out OpenFileDialog dlg) = new OpenFileDialog       // <--- inline local 'dlg'
    {
        InitialDirectory = Environment.CurrentDirectory,
        Title = "Pick a file",
    }).ShowDialog() == true)
{
    MessageBox.Show(dlg.FileName);
}

The first example trivially assigns from an integer literal. In the second example, the inline local dlg is assigned from a constructor (new expression), and then the entire assignment expression is used for its resolved value to call an instance method (ShowDialog) on the newly created instance. For precise clarity as a standalone example, it finishes by showing that the referred instance of dlg did indeed need to be named as a variable, in order to fetch one of its properties.


[edit:] Regarding...

2. Ref locals are initialized to a certain storage location, and cannot be mutated to point to another.

...it would certainly be nice to have a ref variable with a mutable referent, since this would help avoid expensive indexing bounds checks within loop bodies. Of course, that's also precisely why it's not allowed. You probably can't get around this (i.e. ref to an array access expression with indexing containing ref won't work; it gets permanently resolved to the element at the referenced position when initialized) but if it helps, note that you can take a ref to a pointer, and this includes ref local:

int i = 5, j = 6;

int* pi = &i;
ref int* rpi = ref pi;

Console.WriteLine(i + " " + *pi + " " + *rpi);      //   "5 5 5"

pi = &j;

Console.WriteLine(i + " " + *pi + " " + *rpi);      //   "5 6 6"

The point of this admittedly pointless example code is that, although we didn't alter ref local variable rpi itself in any way (since 'ya can't), it does now have a different (ultimate) referent.


More seriously, what ref local does now allow for, as far as tightening up the IL in array-indexing loop bodies, is a technique I call value-type stamping. This allows for efficient IL in loop bodies which need to access multiple fields of each element in an array of value-types. Typically, this has been a trade-off between external initialization (newobj / initobj) followed by a single indexing access versus in-situ non-initialization but with the expense of redundant multiple runtime indexing.

With value-type stamping however, now we can entirely avoid per-element initobj / newobj IL instructions and also have just a single indexing computation at runtime. I'll show the example first, and then describe the technique in general below.

/// <summary>
/// Returns a new array of (int,T) where each element of 'src' is paired with its index.
/// </summary>
public static (int Index, T Item)[] TagWithIndex<T>(this T[] src)
{
    if (src.Length == 0)
        return new (int, T)[0];

    var dst = new (int Index, T Item)[src.Length];     // i.e, ValueTuple<int,T>[]
    ref var p = ref dst[0];      //  <--  co-opt element 0 of target for 'T' staging

    ref int i = ref p.Index;  //  <-- index field in target will also control loop
    i = src.Length;    

    while (true)
    {
        p.Item = src[--i];
        if (i == 0)
            return dst;
        dst[i] = p;
    }
}

The example shows a concise yet extreme use of the value-type stamping technique; you can discern its twist (given away in a comment) on your own if you're interested. In what follows, I'll instead discuss the value-type stamping technique in more general terms.

First, prepare ref locals with references directly to the relevant fields in a staging instance of the value-type. This can be either on the stack, or, as shown in the example, co-opted from the last-to-be-processed element of the target array itself. It may be valuable to have a ref to the entire staging instance as well, especially if using the co-opting technique.

Each iteration of the loop body can then prepare the staging instance very efficiently, and as a final step when ready, "stamp" it wholesale into the array with only a single indexing operation. Of course, if the final element of the array was co-opted as the staging instance, then you can also leave the loop slightly earlier.



标签: c# c#-7.0