So pretty much every question related to capacity in ArrayList is how to use it or (oddly) access it and I am quite familiar with that information. What I am interested in whether it is actually worth using the ArrayList constructor that sets capacity if you happen to know or have a rough idea how many items will be in the ArrayList?
Are there any comprehensive benchmarks comparing how long it takes to just use naive adding of elements to an ArrayList versus pre-setting the capacity of an ArrayList?
Obviously for any specific application you'd have to test any performance adjustments to determine if they are in fact optimizations (and if they are in fact necessary), but there are some times that setting the capacity explicitly can be worthwhile. For example:
- You're creating a very large number of array-lists, most of which will be very small. In this case, you might want to set the initial capacity very low, and/or to trim the capacity whenever you're done populating a given array. (In this case, the optimization is less a matter of speed than of memory usage. But note that the list itself has memory overhead, as does the array it contains, so in this sort of situation it's likely to be better to redesign in such a way as to have fewer lists.)
- You're creating an array-list of a very large known size, and you want the time to add each element to be very small (perhaps because each time you add an element, you have to send some response to an external data-source). (The default geometric growth takes amortized constant time: every once in a while, a massive penalty is incurred, such that the overall average performance is completely fine, but if you care about individual insertions taken individually, that might not be good enough.)
I have nothing substantial to add to ruakh's answer, but here's a quick test function. I keep a scrap project around for writing little tests like these. Adjust the sourceSize to something representative of your data, and you can get a rough idea of the magnitude of the effect. As shown, I saw about a factor of 2 between them.
import java.util.ArrayList;
import java.util.Random;
public class ALTest {
public static long fill(ArrayList<Byte> al, byte[] source) {
long start = System.currentTimeMillis();
for (byte b : source) {
al.add(b);
}
return System.currentTimeMillis()-start;
}
public static void main(String[] args) {
int sourceSize = 1<<20; // 1 MB
int smallIter = 50;
int bigIter = 4;
Random r = new Random();
byte[] source = new byte[sourceSize];
for (int i = 0;i<bigIter;i++) {
r.nextBytes(source);
{
long time = 0;
for (int j = 0;j<smallIter;j++) {
ArrayList<Byte> al = new ArrayList<Byte>(sourceSize);
time += fill(al,source);
}
System.out.print("With: "+time+"ms\t");
}
{
long time = 0;
for (int j = 0;j<smallIter;j++) {
ArrayList<Byte> al = new ArrayList<Byte>();
time += fill(al,source);
}
System.out.print("Without: "+time+"ms\t");
}
{
long time = 0;
for (int j = 0;j<smallIter;j++) {
ArrayList<Byte> al = new ArrayList<Byte>();
time += fill(al,source);
}
System.out.print("Without: "+time+"ms\t");
}
{
long time = 0;
for (int j = 0;j<smallIter;j++) {
ArrayList<Byte> al = new ArrayList<Byte>(sourceSize);
time += fill(al,source);
}
System.out.print("With: "+time+"ms");
}
System.out.println();
}
}
}
Output:
With: 401ms Without: 799ms Without: 731ms With: 347ms
With: 358ms Without: 744ms Without: 749ms With: 342ms
With: 348ms Without: 719ms Without: 739ms With: 347ms
With: 339ms Without: 734ms Without: 774ms With: 358ms
ArrayList internals uses simple arrays to store its elements, if the number of elements exceeds the capacity of the underlying array, a resize effort is need. So, in the case you know how many items will your List contain, you can inform ArrayList to use an array of the needed size so the resize logic won't be needed or executed.