This is not a question of what is boxing and unboxing, it is rather why do languages like Java and C# need that ?
I am greatly familiar wtih C++, STL and Boost.
In C++ I could write something like this very easily,
std::vector<double> dummy;
I have some experience with Java, but I was really surprised because I had to write something like this,
ArrayList<Double> dummy = new ArrayList<Double>();
My question, why should it be an Object, what is so hard technically to include primitive types when talking about Generics ?
I believe this is also because primitives do not inherit from Object. Suppose you have a method that wants to be able to accept anything at all as the parameter, eg.
You may need to pass a simple primitive value to that method, like:
You would be able to do that without boxing/unboxing, because 5 is a primitive and is not an Object. You could overload the print method for each primitive type to enable such functionality, but it's a pain.
I can only tell you for Java why it doesn't support primitve types in generics.
First there was the problem that the question to support this everytime brought on the discussion if java should even have primitive types. Which of course hindered the discussion of the actual question.
Second the main reason not to include it was that they wanted binary backward compatibility so it would run unmodified on a VM not aware of generics. This backward compatibility/migration compatibility reason is also why now the Collections API supports generics and stayed the same and there isn't (as in C# when they introduced generics) a complete new set of a generic aware Collection API.
The compatibility was done using ersure (generic type parameter info removed at compile time) which is also the reason you get so many unchecked cast warnings in java.
You could still add reified generics but it's not that easy. Just adding the type info add runtime instead of removing it won't work as it breaks source & binary compatibility (you can't continue to use raw types and you can't call existing compiled code because they don't have the corresponding methods).
The other approach is the one C# chose: see above
And automated autoboxing/unboxing wasn't supported for this use case because autoboxing costs too much.
Java theory and practice: Generics gotchas
In Java's case, it's because of the way generics work. In Java, generics are a compile-time trick, that prevents you from putting an
Image
object into anArrayList<String>
. However, Java's generics are implemented with type erasure: the generic type information is lost during run-time. This was for compatibility reasons, because generics were added fairly late in Java's life. This means that, run-time, anArrayList<String>
is effectively anArrayList<Object>
(or better: justArrayList
that expects and returnsObject
in all of its methods) that automatically casts toString
when you retrieve a value.But since
int
doesn't derive fromObject
, you can't put it in an ArrayList that expects (at runtime)Object
and you can't cast anObject
toint
either. This means that the primitiveint
must be wrapped into a type that does inherit fromObject
, likeInteger
.C# for example, works differently. Generics in C# are also enforced at runtime and no boxing is required with a
List<int>
. Boxing in C# only happens when you try to store a value type likeint
in a reference type variable likeobject
. Sinceint
in C# inherits fromObject
in C#, writingobject obj = 2
is perfectly valid, however the int will be boxed, which is done automatically by the compiler (noInteger
reference type is exposed to the user or anything).Boxing and unboxing are a necessity born out of the way that languages (like C# and Java) implement their memory allocation strategies.
Certain types are allocated on the stack and other on the heap. In order to treat a stack-allocated type as a heap-allocated type, boxing is required to move the stack-allocated type onto the heap. Unboxing is the reverse processes.
In C# stack-allocated types are called value types (e.g.
System.Int32
andSystem.DateTime
) and heap-allocated types are called reference types (e.g.System.Stream
andSystem.String
).In some cases it is advantageous to be able to treat a value type like a reference type (reflection is one example) but in most cases, boxing and unboxing are best avoided.
Every non-array non-string object stored on the heap contains an 8- or 16-byte header (sizes for 32/64-bit systems), followed by the contents of that object's public and private fields. Arrays and strings have the above header, plus some more bytes defining the length of the array and size of each element (and possibly the number of dimensions, length of each extra dimension, etc.), followed by all of the fields of the first element, then all the fields of the second, etc. Given an reference to an object, the system can easily examine the header and determine what type it is.
Reference-type storage locations hold a four- or eight-byte value which uniquely identifies an object stored on the heap. In present implementations, that value is a pointer, but it's easier (and semantically equivalent) to think of it as an "object ID".
Value-type storage locations hold the contents of the value type's fields, but do not have any associated header. If code declares a variable of type
Int32
, there's no need to need to store information with thatInt32
saying what it is. The fact that that location holds anInt32
is effectively stored as part of the program, and so it doesn't have to be stored in the location itself. This an represent a big savings if, e.g., one has a million objects each of which have a field of typeInt32
. Each of the objects holding theInt32
has a header which identifies the class that can operate it. Since one copy of that class code can operate on any of the million instances, having the fact that the field is anInt32
be part of the code is much more efficient than having the storage for every one of those fields include information about what it is.Boxing is necessary when a request is made to pass the contents of a value-type storage location to code which doesn't know to expect that particular value type. Code which expects objects of unknown type can accept a reference to an object stored on the heap. Since every object stored on the heap has a header identifying what type of object it is, code can use that header whenever it's necessary to use an object in a way which would require knowing its type.
Note that in .net, it is possible to declare what are called generic classes and methods. Each such declaration automatically generates a family of classes or methods which are identical except fort he type of object upon which they expect to act. If one passes an
Int32
to a routineDoSomething<T>(T param)
, that will automatically generate a version of the routine in which every instance of typeT
is effectively replaced withInt32
. That version of the routine will know that every storage location declared as typeT
holds anInt32
, so just as in the case where a routine was hard-coded to use anInt32
storage location, it will not be necessary to store type information with those locations themselves.In Java and C# (unlike C++) everything extends Object, so collection classes like ArrayList can hold Object or any of its descendants (basically anything).
For performance reasons, however, primitives in java, or value types in C#, were given a special status. They are not object. You cannot do something like (in Java):
Even though toString is a method on Object. In order to bridge this nod to performance, equivalent objects were created. AutoBoxing removes the boilerplate code of having to put a primitive in its wrapper class and take it out again, making the code more readable.
The difference between value types and objects in C# is more grey. See here about how they are different.