Everybody's saying that one should use vector because of the perfomance (cause Vector synchronizes after every operation and stuff). I've written a simple test:
import java.util.ArrayList;
import java.util.Date;
import java.util.Vector;
public class ComparePerformance {
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
Vector<Integer> vector = new Vector<Integer>();
int size = 10000000;
int listSum = 0;
int vectorSum = 0;
long startList = new Date().getTime();
for (int i = 0; i < size; i++) {
list.add(new Integer(1));
}
for (Integer integer : list) {
listSum += integer;
}
long endList = new Date().getTime();
System.out.println("List time: " + (endList - startList));
long startVector = new Date().getTime();
for (int i = 0; i < size; i++) {
vector.add(new Integer(1));
}
for (Integer integer : list) {
vectorSum += integer;
}
long endVector = new Date().getTime();
System.out.println("Vector time: " + (endVector - startVector));
}
}
The results are as follows:
List time: 4360
Vector time: 4103
Based on this it seems that Vector
perfomance at iterating over and reading is slightly better. Maybe this is a dumb queston or I've made wrong assumptions - can somebody please explan this?
You have written a naïve microbenchmark. Microbenchmarking on the JVM is very tricky business and it is not even easy to enumerate all the pitfalls, but here are some classic ones:
- you must warm up the code;
- you must control for garbage collection pauses;
System.currentTimeMillis
is imprecise, but you don't seem to be aware of even this method (your new Date().getTime()
is equivalent, but slower).
If you want to do this properly, then check out Oracle's jmh
tool or Google's Caliper.
My Test Results
Since I was kind of interested to see these numbers myself, here is the output of jmh
. First, the test code:
public class Benchmark1
{
static Integer[] ints = new Integer[0];
static {
final List<Integer> list = new ArrayList(asList(1,2,3,4,5,6,7,8,9,10));
for (int i = 0; i < 5; i++) list.addAll(list);
ints = list.toArray(ints);
}
static List<Integer> intList = Arrays.asList(ints);
static Vector<Integer> vec = new Vector<Integer>(intList);
static List<Integer> list = new ArrayList<Integer>(intList);
@GenerateMicroBenchmark
public Vector<Integer> testVectorAdd() {
final Vector<Integer> v = new Vector<Integer>();
for (Integer i : ints) v.add(i);
return v;
}
@GenerateMicroBenchmark
public long testVectorTraverse() {
long sum = (long)Math.random()*10;
for (int i = 0; i < vec.size(); i++) sum += vec.get(i);
return sum;
}
@GenerateMicroBenchmark
public List<Integer> testArrayListAdd() {
final List<Integer> l = new ArrayList<Integer>();
for (Integer i : ints) l.add(i);
return l;
}
@GenerateMicroBenchmark
public long testArrayListTraverse() {
long sum = (long)Math.random()*10;
for (int i = 0; i < list.size(); i++) sum += list.get(i);
return sum;
}
}
And the results:
testArrayListAdd 234.896 ops/msec
testVectorAdd 274.886 ops/msec
testArrayListTraverse 1718.711 ops/msec
testVectorTraverse 34.843 ops/msec
Note the following:
- in the
...add
methods I am creating a new, local collection. The JIT compiler uses this fact and elides the locking on Vector
methods—hence almost equal performance;
- in the
...traverse
methods I am reading from a global collection; the locks cannot be elided and this is where the true performance penalty of Vector
shows up.
The main takeaway from this should be: the performance model on the JVM is highly complex, sometimes even erratic. Extrapolating from microbenchmarks, even when they are done with all due care, can lead to dangerously wrong predictions about production system performance.
I agree with Marko about using Caliper, it's an awesome framework.
But you can get a part of it done yourself if you organize your benchmark a bit better:
public class ComparePerformance {
private static final int SIZE = 1000000;
private static final int RUNS = 500;
private static final Integer ONE = Integer.valueOf(1);
static class Run {
private final List<Integer> list;
Run(final List<Integer> list) {
this.list = list;
}
public long perform() {
long oldNanos = System.nanoTime();
for (int i = 0; i < SIZE; i++) {
list.add(ONE);
}
return System.nanoTime() - oldNanos;
}
}
public static void main(final String[] args) {
long arrayListTotal = 0L;
long vectorTotal = 0L;
for (int i = 0; i < RUNS; i++) {
if (i % 50 == 49) {
System.out.println("Run " + (i + 1));
}
arrayListTotal += new Run(new ArrayList<Integer>()).perform();
vectorTotal += new Run(new Vector<Integer>()).perform();
}
System.out.println();
System.out.println("Runs: "+RUNS+", list size: "+SIZE);
output(arrayListTotal, "List");
output(vectorTotal, "Vector");
}
private static void output(final long value, final String name) {
System.out.println(name + " total time: " + value + " (" + TimeUnit.NANOSECONDS.toMillis(value) + " " + "ms)");
long avg = value / RUNS;
System.out.println(name + " average time: " + avg + " (" + TimeUnit.NANOSECONDS.toMillis(avg) + " " + "ms)");
}
}
The key part is running your code, often. Also, remove stuff that's unrelated to your benchmark. Re-use Integers instead of creating new ones.
The above benchmark code creates this output on my machine:
Runs: 500, list size: 1000000
List total time: 3524708559 (3524 ms)
List average time: 7049417 (7 ms)
Vector total time: 6459070419 (6459 ms)
Vector average time: 12918140 (12 ms)
I'd say that should give you an idea of the performance differences.
As Marko Topolnik said, it is hard to write correct microbenchmarks and to interprete the results correctly. There are good articles about this subject availible.
From my experience and what I know of the implementation I use this rule of thumb:
- Use ArrayList
- If the collection must be synchronized consider the usage of vector. (I never end up using it, because there are other solutions for synchronization, concurrency and parallel programming)
- If there are many elements in the collection and there are frequent insert or remove operations inside the list (not at the end) then use LinkedList
Most collections do not contain many elements and it would be a waste of time to spend more effort to them. Also in scala there are parallel collections, which perform some operations in parallel. Maybe there is something available for use in pure Java, too.
Whenever possible use the List interface to hide implementation details and try to add comments which show your reasons WHY you've chosen a specific implementation.
I make your test, and ArrayList is faster than Vector with size of 1000000
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
Vector<Integer> vector = new Vector<Integer>();
int size= 1000000;
int listSum = 0;
int vectorSum = 0;
long startList = System.nanoTime();
for (int i = 0; i < size; i++) {
list.add(Integer.valueOf(1));
}
for (Integer integer : list) {
listSum += integer;
}
long endList = System.nanoTime();
System.out.println("List time: " + (endList - startList)/1000000);
//
// long startVector = System.nanoTime();
// for (int i = 0; i < size; i++) {
// vector.add(Integer.valueOf(1));
// }
// for (Integer integer : list) {
// vectorSum += integer;
// }
// long endVector = System.nanoTime();
// System.out.println("Vector time: " + (endVector - startVector)/1000000);
}
}
Output running different times.
Code : list time 83
vector time 113