I've been looking at at some of the java primitive collections (trove, fastutil, hppc) and I've noticed a pattern that class variables are sometimes declared as final
local variables. For example:
public void forEach(IntIntProcedure p) {
final boolean[] used = this.used;
final int[] key = this.key;
final int[] value = this.value;
for (int i = 0; i < used.length; i++) {
if (used[i]) {
p.apply(key[i],value[i]);
}
}
}
I've done some benchmarking, and it appears that it is slightly faster when doing this, but why is this the case? I'm trying to understand what Java would do differently if the first three lines of the function were commented out.
Note: This seems similiar to this question, but that was for c++ and doesn't address why they are declared final
.
The
final
keyword is a red herring here. The performance difference comes because they are saying two different things.is saying, "fetch a boolean array, and for each element of that array do something."
Without
final boolean[] used
, the function is saying "while the index is less than the length of the current value of theused
field of the current object, fetch the current value of theused
field of the current object and do something with the element at indexi
."The JIT might have a much easier time proving loop bound invariants to eliminate excess bound checks and so on because it can much more easily determine what would cause the value of
used
to change. Even ignoring multiple threads, ifp.apply
could change the value ofused
then the JIT can't eliminate bounds checks or do other useful optimizations.Accessing local variable or parameter is a single step operation: take a variable located at offset N on the stack. If you function has 2 arguments (simplified):
this
So when you access local variable, you have one memory access at fixed offset (N is known at compilation time). This is the bytecode for accessing first method argument (
int
):However when you access field, you are actually performing an extra step. First you are reading "local variable"
this
just to determine the current object address. Then you are loading a field (getfield
) which has a fixed offset fromthis
. So you perform two memory operations instead of one (or one extra). Bytecode:So technically accessing local variables and parameters is faster than object fields. In practice, many other factors may affect performance (including various levels of CPU cache and JVM optimizations).
final
is a different story. It is basically a hint for the compiler/JIT that this reference won't change so it can make some heavier optimizations. But this is much harder to track down, as a rule of thumb usefinal
whenever possible.Such simple optimizations are already included in JVM runtime. If JVM does naive access to instance variables, our Java applications will be turtle slow.
Such manual tuning probably worthwhile for simpler JVMs though, e.g. Android.
it tells the runtime (jit) that in the context of that method call, those 3 values will never change, so the runtime does not need to continually load the values from the member variable. this may give a slight speed improvement.
of course, as the jit gets smarter and can figure out these things on its own, these conventions become less useful.
note, i didn't make it clear that the speedup is more from using a local variable than the final part.
In the generated VM opcodes local variables are entries on the operand stack while field references must be moved to the stack via an instruction that retrieves the value through the object reference. I imagine the JIT can make the stack references register references more easily.