How do actually castings work at the CLR level?

2019-02-13 15:51发布

When doing an upcast or downcast, what does really happen behind the scenes? I had the idea that when doing something as:

string myString = "abc";
object myObject = myString;
string myStringBack = (string)myObject;

the cast in the last line would have as only purpose tell the compiler we are safe we are not doing anything wrong. So, I had the idea that actually no casting code would be embedded in the code itself. It seems I was wrong:

.maxstack 1
.locals init (
    [0] string myString,
    [1] object myObject,
    [2] string myStringBack)
L_0000: nop 
L_0001: ldstr "abc"
L_0006: stloc.0 
L_0007: ldloc.0 
L_0008: stloc.1 
L_0009: ldloc.1 
L_000a: castclass string
L_000f: stloc.2 
L_0010: ret 

Why does the CLR need something like castclass string?

There are two possible implementations for a downcast:

  1. You require a castclass something. When you get to the line of code that does an castclass, the CLR tries to make the cast. But then, what would happen had I ommited the castclass string line and tried to run the code?
  2. You don't require a castclass. As all reference types have a similar internal structure, if you try to use a string on an Form instance, it will throw an exception of wrong usage (because it detects a Form is not a string or any of its subtypes).

Also, is the following statamente from C# 4.0 in a Nutshell correct?

Upcasting and downcasting between compatible reference types performs reference
conversions: a new reference is created that points to the same object.

Does it really create a new reference? I thought it'd be the same reference, only stored in a different type of variable.

Thanks

2条回答
冷血范
2楼-- · 2019-02-13 16:12

I had the idea that actually no casting code would be embedded in the code itself.

An interesting idea. How did you imagine that this worked?

try
{
    object x = 123;
    object y = (string)x;
}
catch(InvalidCastException ex)
{ ... }

If the cast produces no code then where does the code that throws the exception happen?

Remember, the primary purpose of a cast from a less specific type to a more specific type is to perform a runtime type check.

Once the type check passes, then sure, nothing else really has to happen. The bits of the reference before the type check and the bits after the type check are the same bits; we've just had the runtime verify that the new usage of the old bits is justified.

if you try to use a string on an Form instance, it will throw an exception of wrong usage (because it detects a Form is not a string or any of its subtypes).

Where does it detect that? I mean, in exactly which instruction is that detected? In the castclass instruction. That's what the castclass instruction is for.

what would happen had I ommited the castclass string line and tried to run the code?

The type safety verifier would have rejected your program. Had you forced the CLR to run it without passing verification then it would have had undefined behaviour. It might have succeeded, it might have failed, it might have formatted your hard disk.

Does it really create a new reference?

Remember, at the implementation level a reference is just a pointer-sized integer. It's a number that the memory manager can use to track the position of the referred-to data. It might be a pointer, it might be a handle, it doesn't matter what it is; it's something that implements the abstract notion of a reference.

When you have a variable that contains 12 and you "replace" its contents with 12, is that a "new" 12 that has just been created or is it the "old" 12? Suppose you make a second variable and put 12 in it too by copying from the first variable. Is that a "new" 12 or the "old" 12? How can you tell? It's a difference that makes no difference. When you make a "new" reference that is identical to an "old" reference is that creating something new? The question is a philosophical question, not a technical one.

查看更多
霸刀☆藐视天下
3楼-- · 2019-02-13 16:29

You're confusing reference with instance. A new reference is created, not a new instance.

object foo = "bar";
string baz = (string)foo;

A new reference to the string "foo" is assigned to the baz variable (but there is still only one instance of the string, it's just that both variables point to the single instance). Were this not the case, you would have something akin to a "handle" type. If baz and foo were literally the same reference, then this..

foo = "bim";

Would also make baz equal to "bim" (likewise, assigning a non-string type would make baz no longer point to a valid string reference).

You can perform a cast on a reference type either when they're in the same inheritance heirarchy (one inherits from the other either directly or indirectly) or when an explicit conversion between the types exists. Note that explicit conversions, like all other operators, are not polymorphic -- that is, the conversion must be defined specifically on one of the classes in question, not at another point in the heirarchy.

An explicit conversion, when present, will take priority even if the types in question are compatible without it. In the event of an explicit conversion, you have no guarantee (in fact, it's quite unlikely) that the result of the cast/conversion will point to the same instance as the object being cast.

查看更多
登录 后发表回答