Why the compiler emits box instructions to compare

2019-01-21 19:09发布

Here is a simple generic type with a unique generic parameter constrained to reference types:

class A<T> where T : class
{
    public bool F(T r1, T r2)
    {
        return r1 == r2;
    }
}

The generated IL by csc.exe is :

ldarg.1
box        !T
ldarg.2
box        !T
ceq

So each parameter is boxed before proceeding with the comparison.

But if the constraint indicates that "T" should never be a value type, why is the compiler trying to box r1 and r2 ?

2条回答
我命由我不由天
2楼-- · 2019-01-21 19:45

It's required to satisfy the verifiability constraints for the generated IL. Note that unverifiable doesn't necessarily mean incorrect. It works just fine without the box instruction as long as its security context allows running unverifiable code. Verification is conservative and is based on a fixed rule set (like reachability). To simplify things, they chose not to care about presence of generic type constraints in the verification algorithm.

Common Language Infrastructure Specification (ECMA-335)

Section 9.11: Constraints on generic parameters

... Constraints on a generic parameter only restrict the types that the generic parameter may be instantiated with. Verification (see Partition III) requires that a field, property or method that a generic parameter is known to provide through meeting a constraint, cannot be directly accessed/called via the generic parameter unless it is first boxed (see Partition III) or the callvirt instruction is prefixed with the constrained prefix instruction. ...

Removing the box instructions will result in unverifiable code:

.method public hidebysig instance bool 
       F(!T r1,
         !T r2) cil managed
{
   ldarg.1
   ldarg.2
   ceq
   ret
}


c:\Users\Mehrdad\Scratch>peverify sc.dll

Microsoft (R) .NET Framework PE Verifier.  Version  4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

[IL]: Error: [c:\Users\Mehrdad\Scratch\sc.dll : A`1[T]::F][offset 0x00000002][fo
und (unboxed) 'T'] Non-compatible types on the stack.
1 Error(s) Verifying sc.dll

UPDATE (Answer to the comment): As I mentioned above, verifiability is not equivalent to correctness (here I'm talking about "correctness" from a type-safety point of view). Verifiable programs are a strict subset of correct programs (i.e. all verifiable programs are demonstrably correct, but there are correct programs that are not verifiable). Thus, verifiability is a stronger property than correctness. Since C# is a Turing-complete language, Rice's theorem states that proving that programs are correct is undecidable in general case.

Let's get back to my reachability analogy as it's easier to explain. Assume you were designing C#. One thing have thought about is when to issue a warning about unreachable code, and to remove that piece of code altogether in the optimizer, but how you are going to detect all unreachable code? Again, Rice's theorem says you can't do that for all programs. For instance:

void Method() {
    while (true) {
    }
    DoSomething();  // unreachable code
}

This is something that C# compiler actually warns about. But it doesn't warn about:

bool Condition() {
   return true;
}

void Method() {
   while (Condition()) {
   }
   DoSomething();  // no longer considered unreachable by the C# compiler
}

A human can prove that control flow never reaches that line in the latter case. One could argue that the compiler could statically prove DoSomething is unreachable in this case too, but it doesn't. Why? The point is you can't do that for all programs, so you should draw the line at some point. At this stage, you have to define a decidable property and call it "reachability". For instance, for reachability, C# sticks to constant expressions and won't look at the contents of the functions at all. Simplicity of analysis and design consistency are important goals in deciding where to draw the line.

Going back to our verifiability concept, it's a similar problem. Verifiability, unlike correctness, is a decidable property. As the runtime designer, you have to decide how to define verifiability, based on performance considerations, easy of implementation, ease of specification, consistency, making it easy for the compiler to confidently generate verifiable code. Like most design decisions, it involves a lot of trade-offs. Ultimately, the CLI designers have decided that they prefer not too look at generic constraints at all when they are checking for verifiability.

查看更多
太酷不给撩
3楼-- · 2019-01-21 19:46

Mehrdad's answer is quite good; I just wanted to add a couple points:

First, yes, in this case this is just to keep the verifier happy. The jitter should of course optimize away the boxing instruction since it is not meaningful to box a reference type.

However, there are cases where to keep the verifier happy we must introduce boxing instructions that are not optimized away. For example, if you say:

class B<T> { public virtual void M<U>(U u) where U : T {...} }
class D : B<int> 
{ 
    public override void M<U>(U u)
    {

The C# compiler knows that in D.M, U can only be int. Nevertheless, in order to be verifiable there are situations where u must be boxed to object and then unboxed to int. The jitter does not always optimize away these; we've pointed out to the jitter team that this is a possible optimization, but the situation is so obscure that it is unlikely to cause a big win for many actual customers. There are bigger-bang-for-buck optimizations they could be spending their time on.

查看更多
登录 后发表回答