Java for loop performance question

2020-02-09 02:22发布

问题:

considering this example:

public static void main(final String[] args) {
    final List<String> myList = Arrays.asList("A", "B", "C", "D");
    final long start = System.currentTimeMillis();
    for (int i = 1000000; i > myList.size(); i--) {
        System.out.println("Hello");
    }
    final long stop = System.currentTimeMillis();
    System.out.println("Finish: " + (stop - start));
}

vs

public static void main(final String[] args) {
    final List<String> myList = Arrays.asList("A", "B", "C", "D");
    final long start = System.currentTimeMillis();
    final int size = myList.size();
    for (int i = 1000000; i > size; i--) {
        System.out.println("Hello");
    }
    final long stop = System.currentTimeMillis();
    System.out.println("Finish: " + (stop - start));
}

Will this make any diffrence ? On my machine the second one seems to perform faster but i don't know if it is really accurate. Will the compiler optimze this code ? I could think that he would if the loop condition is an immutable object (e.g. String array).

回答1:

If you want to test something like this, you really must optimize your microbenchmark to measure what you care about.

First, make the loop inexpensive but impossible to skip. Computing a sum usually does the trick.

Second, compare the two timings.

Here's some code that does both:

import java.util.*;

public class Test {

public static long run1() {
  final List<String> myList = Arrays.asList("A", "B", "C", "D");
  final long start = System.nanoTime();
  int sum = 0;
  for (int i = 1000000000; i > myList.size(); i--) sum += i;
  final long stop = System.nanoTime();
  System.out.println("Finish: " + (stop - start)*1e-9 + " ns/op; sum = " + sum);
  return stop-start;
}

public static long run2() {
  final List<String> myList = Arrays.asList("A", "B", "C", "D");
  final long start = System.nanoTime();
  int sum = 0;
  int limit = myList.size();
  for (int i = 1000000000; i > limit; i--) sum += i;
  final long stop = System.nanoTime();
  System.out.println("Finish: " + (stop - start)*1e-9 + " ns/op; sum = " + sum);
  return stop-start;
}

public static void main(String[] args) {
  for (int i=0 ; i<5 ; i++) {
    long t1 = run1();
    long t2 = run2();
    System.out.println("  Speedup = " + (t1-t2)*1e-9 + " ns/op\n");
  }
}

}

And if we run it, on my system we get:

Finish: 0.481741256 ns/op; sum = -243309322
Finish: 0.40228402 ns/op; sum = -243309322
  Speedup = 0.079457236 ns/op

Finish: 0.450627151 ns/op; sum = -243309322
Finish: 0.43534661700000005 ns/op; sum = -243309322
  Speedup = 0.015280534 ns/op

Finish: 0.47738474700000005 ns/op; sum = -243309322
Finish: 0.403698331 ns/op; sum = -243309322
  Speedup = 0.073686416 ns/op

Finish: 0.47729349600000004 ns/op; sum = -243309322
Finish: 0.405540508 ns/op; sum = -243309322
  Speedup = 0.071752988 ns/op

Finish: 0.478979617 ns/op; sum = -243309322
Finish: 0.36067492700000003 ns/op; sum = -243309322
  Speedup = 0.11830469 ns/op

which means that the overhead of the method call is approximately 0.1 ns. If your loop does things that take no more than 1-2 ns, then you should care about this. Otherwise, don't.



回答2:

Personally, I don't think you can draw any meaningful conclusions from a contrived example like this.

But if you really want to know, why not use javap to decompile the code and see what's different? Why guess about what the compiler's doing when you can see for yourself without asking here?

Byte code for the first case:

public class Stackoverflow extends java.lang.Object{
public Stackoverflow();
  Code:
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Code:
   0:   iconst_4
   1:   anewarray       #2; //class java/lang/String
   4:   dup
   5:   iconst_0
   6:   ldc     #3; //String A
   8:   aastore
   9:   dup
   10:  iconst_1
   11:  ldc     #4; //String B
   13:  aastore
   14:  dup
   15:  iconst_2
   16:  ldc     #5; //String C
   18:  aastore
   19:  dup
   20:  iconst_3
   21:  ldc     #6; //String D
   23:  aastore
   24:  invokestatic    #7; //Method java/util/Arrays.asList:([Ljava/lang/Object;)Ljava/util/List
   27:  astore_1
   28:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   31:  lstore_2
   32:  ldc     #9; //int 1000000
   34:  istore  4
   36:  iload   4
   38:  aload_1
   39:  invokeinterface #10,  1; //InterfaceMethod java/util/List.size:()I
   44:  if_icmple       61
   47:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   50:  ldc     #12; //String Hello
   52:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   55:  iinc    4, -1
   58:  goto    36
   61:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   64:  lstore  4
   66:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   69:  new     #14; //class java/lang/StringBuilder
   72:  dup
   73:  invokespecial   #15; //Method java/lang/StringBuilder."<init>":()V
   76:  ldc     #16; //String Finish:
   78:  invokevirtual   #17; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/la
   81:  lload   4
   83:  lload_2
   84:  lsub
   85:  invokevirtual   #18; //Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
   88:  invokevirtual   #19; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   91:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   94:  return
}

Byte code for the second case:

public class Stackoverflow extends java.lang.Object{
public Stackoverflow();
  Code:
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Code:
   0:   iconst_4
   1:   anewarray       #2; //class java/lang/String
   4:   dup
   5:   iconst_0
   6:   ldc     #3; //String A
   8:   aastore
   9:   dup
   10:  iconst_1
   11:  ldc     #4; //String B
   13:  aastore
   14:  dup
   15:  iconst_2
   16:  ldc     #5; //String C
   18:  aastore
   19:  dup
   20:  iconst_3
   21:  ldc     #6; //String D
   23:  aastore
   24:  invokestatic    #7; //Method java/util/Arrays.asList:([Ljava/lang/Object;)Ljava/util/List;
   27:  astore_1
   28:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   31:  lstore_2
   32:  aload_1
   33:  invokeinterface #9,  1; //InterfaceMethod java/util/List.size:()I
   38:  istore  4
   40:  ldc     #10; //int 1000000
   42:  istore  5
   44:  iload   5
   46:  iload   4
   48:  if_icmple       65
   51:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   54:  ldc     #12; //String Hello
   56:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   59:  iinc    5, -1
   62:  goto    44
   65:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   68:  lstore  5
   70:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   73:  new     #14; //class java/lang/StringBuilder
   76:  dup
   77:  invokespecial   #15; //Method java/lang/StringBuilder."<init>":()V
   80:  ldc     #16; //String Finish:
   82:  invokevirtual   #17; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   85:  lload   5
   87:  lload_2
   88:  lsub
   89:  invokevirtual   #18; //Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
   92:  invokevirtual   #19; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   95:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   98:  return
}

There are differences, but I'm not sure I can make a definitive statement about their effect on performance.

I'd code the second one, because it would mean (on the face of it) one method call as opposed to one per loop iteration. I don't know if the compiler can optimize it away, but I'm certain that I can do it pretty easily. So I do, regardless of its effect on wall time.



回答3:

I once worked on a project where my first task was to track down some insanely slow code (it was on a brand new 486 machine and it took about 20 minutes to execute):

for(size_t i = 0; i < strlen(data); i++)
{
    // do something with data[i]
}

The solution was (got it down to something like two minutes or less):

size_t length = strlen(data);

for(int i = 0; i < length; i++)
{
    // do something with data[i]
}

The issue is that "data" was over 1 million characters, and strlen has to count each one all the time.

In the case of Java the "size()" method probably returns a variable, and as such, the VM will inline it. On a VM like the one on Android it probably does not. So the answer is "it depends".

My personal preference is to never call a method more than one time if it is supposed to return the same result each time. That way if the method does involve a calculation it is performed only one time and then it is never an issue.



回答4:

Note that the javac compiler has about nothing to do with optimization. The "important" compiler is the JIT compiler which lives within the JVM.

In your example, in the most generic case, the myList.size() is a simple method dispatch, which returns the contents of a field in the List instance. This is negligible work compared to what is implied by System.out.println("Hello") (at least one system call, hence hundreds of clock cycles, compared to no more than a dozen for the method dispatch). I very much doubt that your code could exhibit a meaningful difference in speed.

On a more general basis, the JIT compiler should recognize this call to size() as a call to a known instance, so that it may perform the method dispatch with a direct function call (which is faster), or even inline the size() method call, reducing the call to a simple instance field access.



回答5:

It can't optimize it, because mylist.size() could change during loop execution. Even if it's final, this just means that the reference is final (meaning you can't reassign myList to some other object), but methods on myList, such as remove() and add() are still available. Final does not make object immutable.



回答6:

The second one should be faster because the .size() does not have to be called every time the loop is performed. Its much faster to say 1+2=3 once than saying it many times.



回答7:

It makes sense that the second implementation is faster, because you store a single, final, local copy of the variable. The compiler would have to figure out that the size can't change inside the loop in order for the performance to be roughly equivalent.

One question is -- does this kind of micro-optimization really matter? If it does, go with what is running faster in your tests and doesn't rely on a compiler optimization.



回答8:

Almost certainly what you are seeing here is a difference in HotSpot inlining. With a simpler loop it is more likely to inline, and therefore get rid of all the redundant rubbish. It might do the same inlining, but do it earlier on or with less effort. Generally with Java microbenchmarks you should run the code multiple times, from which you can work out start up times, average times and deviations.



回答9:

Java compiler would have optimized it so, but didn't do so by seeing the funny condition. If you wrote it like this there would be no issue.

for (int i = myList.size(); i < 1000000; i--) {
    System.out.println("Hello");
}


回答10:

In cases of "compiler optimization", the best you can do is for-each loops:

for(final String x : myList) { ... }

Which lets the compiler provide the fastest implementation.

Edit:

The difference between your code examples is in the second argument of the for-loop. In the first example, the VM will do a method call (more expensive) and is thus slower (only significant when there are a lot of iterations). In your second example, the VM will do a stack pop (less expensive, and local variables are on the stack), and thus faster (only significant when there are a lot of iterations: for just one iteration, the first one is faster, in terms of memory usage).

Also: "Premature optimization is the root of all evil." Donald Knuth's infamous law.



回答11:

As always with such things, you're going to have to run them both to see which is faster given the implementation that you're using. However, the first one has the potential performance penalty of having to call size() every iteration, and a function call is more expensive than simply checking the variable directly. However, it's possible that that function call might be optimized away depending on your code and what the compiler does, so you'd have to run tests to see.

However, as was pointed out by Pindatjuh, it's better to use a foreach loop when you're going to be iterating over the whole collection like that. It should let the compiler optimize things better and is less error-prone.



回答12:

The difference is one method call less for each iteration, so the second version should run slightly faster. Although if you use Just-In-Time compiler, he may optimize that - figuring out that it doesn't change during the loop. Standard Java implementation features JIT, but not every Java implementation does.



回答13:

With the last example you will not need to resolve the current size of the array so it will be slightly faster then the first example.

Just remember that this is only useful if you don't change the number of values in your array.

In Android it is recommended to use the latest example in there example, Designing for Performance. http://developer.android.com/guide/practices/design/performance.html#foreach