Profiling my C# application indicated that significant time is spent in List<T>.AddRange
. Using Reflector to look at the code in this method indicated that it calls List<T>.InsertRange
which is implemented as such:
public void InsertRange(int index, IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
if (index > this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
}
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
if (count > 0)
{
this.EnsureCapacity(this._size + count);
if (index < this._size)
{
Array.Copy(this._items, index, this._items, index + count, this._size - index);
}
if (this == is2)
{
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
}
else
{
T[] array = new T[count]; // (*)
is2.CopyTo(array, 0); // (*)
array.CopyTo(this._items, index); // (*)
}
this._size += count;
}
}
else
{
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Insert(index++, enumerator.Current);
}
}
}
this._version++;
}
private T[] _items;
One can argue that the simplicity of the interface (only having one overload of InsertRange) justifies the performance overhead of runtime type cheching and casting.
But what could be the reason behind the 3 lines I have indicated with (*)
?
I think it could be rewritten to the faster alternative:
is2.CopyTo(this._items, index);
Do you see any reason for not using this simpler and apparently faster alternative?
Edit:
Thanks for the answers. So consensus opinion is that this is a protective measure against the input collection implementing the CopyTo in a defective/malicious manner. To me it seems like a overkill to constantly pay the price of 1) runtime type checking 2) dynamic allocation of the temporary array 3) double the copy operation, when all this could have been saved by defining 2 or a few more overloads of InsertRange, one getting IEnumerable
as now, the second getting a List<T>
, third getting T[]
. The later two could have been implemented to run around twice as fast as in the current case.
Edit 2:
I did implement a class FastList, identical to List, except that it also provides an overload of AddRange which takes a T[] argument. This overload does not need the dynamic type verification, and double-copying of elements. I did profile this FastList.AddRange against List.AddRange by adding 4-byte arrays 1000 times to a list which was initially emtpy. My implementation beats the speed of standard List.AddRange with a factor of 9 (nine!). List.AddRange takes about 5% of runtime in one of the important usage scenarios of our application, replacing List with a class providing a faster AddRange could improve application runtime by 4%.
They are preventing the implementation of
ICollection<T>
from accessing indices of the destination list outside the bounds of insertion. The implementation above results in anIndexOutOfBoundsException
if a faulty (or "manipulative") implementation ofCopyTo
is called.Keep in mind that
T[].CopyTo
is quite literally internally implemented asmemcpy
, so the performance overhead of adding that line is minute. When you have such a low cost of adding safety to a tremendous number of calls, you might as well do so.Edit: The part I find strange is the fact that the call to
ICollection<T>.CopyTo
(copying to the temporary array) does not occur immediately following the call toEnsureCapacity
. If it were moved to that location, then following any synchronous exception the list would remain unchanged. As-is, that condition only holds if the insertion happens at the end of the list. The reasoning here is:Array.Copy
cannot fail becauseICollection.CopyTo
and the allocations required for resizing the list and allocating the temporary array. If all three of these occur before moving elements for the insertion, the transaction to change the list cannot throw a synchronous exception.Edit 2 (response to the OP's edit): Have you profiled this? You are making some bold claims that Microsoft should have chosen a more complicated API, so you should make sure you're correct in the assertions that the current method is slow. I've never had a problem with the performance of
InsertRange
, and I'm quite sure that any performance problems someone does face with it will be better resolved with an algorithm redesign than by reimplementing the dynamic list. Just so you don't take me as being harsh in a negative way, keep the following in mind:don't wantcan't stand people on my dev team that like to reinvent the square wheel.It's a good question, I'm struggling to come up with a reason. There's no hint in the Reference Source. One possibility is that they try to avoid a problem when the class that implements the ICollection<>.CopyTo() method objects against copying to a start index other than 0. Or as a security measure, preventing the collection from messing with the array elements it should not have access to.
Another one is that this is a counter-measure when the collection is used in thread-unsafe manner. If an item got added to the collection by another thread it will be the collection class' CopyTo() method that fails, not the Microsoft code. The right person will get the service call.
These are not great explanations.
There is a problem with your solution if you think about it for a minute, if you change the code in that way you are essentially giving the collection that should be added access to an internal datastructure.
This is not a good idea, for example if the author of the List datastructure figures out a better underlying structure to store the data than an array there is no way to change the implementation of List since all collection are expecting an array into the CopyTo function.
In essence you would be cementing the implementation of the List class, even though object oriented programming tells us that the internal implementation of a datastructure should be something that can be changed without breaking other code.