Why do some languages need Boxing and Unboxing?

2020-05-14 07:38发布

问题:

This is not a question of what is boxing and unboxing, it is rather why do languages like Java and C# need that ?

I am greatly familiar wtih C++, STL and Boost.

In C++ I could write something like this very easily,

std::vector<double> dummy;

I have some experience with Java, but I was really surprised because I had to write something like this,

ArrayList<Double> dummy = new ArrayList<Double>();

My question, why should it be an Object, what is so hard technically to include primitive types when talking about Generics ?

回答1:

what is so hard technically to include primitive types when talking about Generics ?

In Java's case, it's because of the way generics work. In Java, generics are a compile-time trick, that prevents you from putting an Image object into an ArrayList<String>. However, Java's generics are implemented with type erasure: the generic type information is lost during run-time. This was for compatibility reasons, because generics were added fairly late in Java's life. This means that, run-time, an ArrayList<String> is effectively an ArrayList<Object> (or better: just ArrayList that expects and returns Object in all of its methods) that automatically casts to String when you retrieve a value.

But since int doesn't derive from Object, you can't put it in an ArrayList that expects (at runtime) Object and you can't cast an Object to int either. This means that the primitive int must be wrapped into a type that does inherit from Object, like Integer.

C# for example, works differently. Generics in C# are also enforced at runtime and no boxing is required with a List<int>. Boxing in C# only happens when you try to store a value type like int in a reference type variable like object. Since int in C# inherits from Object in C#, writing object obj = 2 is perfectly valid, however the int will be boxed, which is done automatically by the compiler (no Integer reference type is exposed to the user or anything).



回答2:

Boxing and unboxing are a necessity born out of the way that languages (like C# and Java) implement their memory allocation strategies.

Certain types are allocated on the stack and other on the heap. In order to treat a stack-allocated type as a heap-allocated type, boxing is required to move the stack-allocated type onto the heap. Unboxing is the reverse processes.

In C# stack-allocated types are called value types (e.g. System.Int32 and System.DateTime) and heap-allocated types are called reference types (e.g. System.Stream and System.String).

In some cases it is advantageous to be able to treat a value type like a reference type (reflection is one example) but in most cases, boxing and unboxing are best avoided.



回答3:

I believe this is also because primitives do not inherit from Object. Suppose you have a method that wants to be able to accept anything at all as the parameter, eg.

class Printer {
    public void print(Object o) {
        ...
    }
}

You may need to pass a simple primitive value to that method, like:

printer.print(5);

You would be able to do that without boxing/unboxing, because 5 is a primitive and is not an Object. You could overload the print method for each primitive type to enable such functionality, but it's a pain.



回答4:

I can only tell you for Java why it doesn't support primitve types in generics.

First there was the problem that the question to support this everytime brought on the discussion if java should even have primitive types. Which of course hindered the discussion of the actual question.

Second the main reason not to include it was that they wanted binary backward compatibility so it would run unmodified on a VM not aware of generics. This backward compatibility/migration compatibility reason is also why now the Collections API supports generics and stayed the same and there isn't (as in C# when they introduced generics) a complete new set of a generic aware Collection API.

The compatibility was done using ersure (generic type parameter info removed at compile time) which is also the reason you get so many unchecked cast warnings in java.

You could still add reified generics but it's not that easy. Just adding the type info add runtime instead of removing it won't work as it breaks source & binary compatibility (you can't continue to use raw types and you can't call existing compiled code because they don't have the corresponding methods).

The other approach is the one C# chose: see above

And automated autoboxing/unboxing wasn't supported for this use case because autoboxing costs too much.

Java theory and practice: Generics gotchas



回答5:

Every non-array non-string object stored on the heap contains an 8- or 16-byte header (sizes for 32/64-bit systems), followed by the contents of that object's public and private fields. Arrays and strings have the above header, plus some more bytes defining the length of the array and size of each element (and possibly the number of dimensions, length of each extra dimension, etc.), followed by all of the fields of the first element, then all the fields of the second, etc. Given an reference to an object, the system can easily examine the header and determine what type it is.

Reference-type storage locations hold a four- or eight-byte value which uniquely identifies an object stored on the heap. In present implementations, that value is a pointer, but it's easier (and semantically equivalent) to think of it as an "object ID".

Value-type storage locations hold the contents of the value type's fields, but do not have any associated header. If code declares a variable of type Int32, there's no need to need to store information with that Int32 saying what it is. The fact that that location holds an Int32 is effectively stored as part of the program, and so it doesn't have to be stored in the location itself. This an represent a big savings if, e.g., one has a million objects each of which have a field of type Int32. Each of the objects holding the Int32 has a header which identifies the class that can operate it. Since one copy of that class code can operate on any of the million instances, having the fact that the field is an Int32 be part of the code is much more efficient than having the storage for every one of those fields include information about what it is.

Boxing is necessary when a request is made to pass the contents of a value-type storage location to code which doesn't know to expect that particular value type. Code which expects objects of unknown type can accept a reference to an object stored on the heap. Since every object stored on the heap has a header identifying what type of object it is, code can use that header whenever it's necessary to use an object in a way which would require knowing its type.

Note that in .net, it is possible to declare what are called generic classes and methods. Each such declaration automatically generates a family of classes or methods which are identical except fort he type of object upon which they expect to act. If one passes an Int32 to a routine DoSomething<T>(T param), that will automatically generate a version of the routine in which every instance of type T is effectively replaced with Int32. That version of the routine will know that every storage location declared as type T holds an Int32, so just as in the case where a routine was hard-coded to use an Int32 storage location, it will not be necessary to store type information with those locations themselves.



回答6:

In Java and C# (unlike C++) everything extends Object, so collection classes like ArrayList can hold Object or any of its descendants (basically anything).

For performance reasons, however, primitives in java, or value types in C#, were given a special status. They are not object. You cannot do something like (in Java):

 7.toString()

Even though toString is a method on Object. In order to bridge this nod to performance, equivalent objects were created. AutoBoxing removes the boilerplate code of having to put a primitive in its wrapper class and take it out again, making the code more readable.

The difference between value types and objects in C# is more grey. See here about how they are different.