I noticed that in the jdk source code and more specifically in the collections framework there is a preference in assigning variables right before reading them in expressions. Is it just a simple preference or is it something more important which I am not aware of? One reason that I can think of is that the variable is used in this expression only.
As I am not used to this style I find it hard to read it. The code is very condensed. Below you can see an example taken from java.util.HashMap.getNode()
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 && ...) {
...
}
As already mentioned in the comment: Doug Lea, who is one of the main authors of the collections framework and the concurrency packages, tends to do optimizations that may look confusing (or even counterintuitive) for mere mortals.
A "famous" example here is copying fields to local variables in order to minimize the size of the bytecode, which is actually also done with the table
field and the local tab
variable in the example that you referred to!
For very simple tests, it does not seem to make a difference (referring to the resulting bytecode size) whether the accesses are "inlined" or not. So I tried to create an example that roughly resembles the structures of the getNode
method that you mentioned: An access to a field that is an array, a length check, an access to a field of one array element...
- The
testSeparate
method does separate assignments and checks
- The
testInlined
method uses the assignment-in-if-style
- The
testRepeated
method (as a counterexample) does every access repeatedly
The code:
class Node
{
int k;
int j;
}
public class AssignAndUseTestComplex
{
public static void main(String[] args)
{
AssignAndUseTestComplex t = new AssignAndUseTestComplex();
t.testSeparate(1);
t.testInlined(1);
t.testRepeated(1);
}
private Node table[] = new Node[] { new Node() };
int testSeparate(int value)
{
Node[] tab = table;
if (tab != null)
{
int n = tab.length;
if (n > 0)
{
Node first = tab[(n-1)];
if (first != null)
{
return first.k+first.j;
}
}
}
return 0;
}
int testInlined(int value)
{
Node[] tab; Node first, e; int n;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1)]) != null) {
return first.k+first.j;
}
return 0;
}
int testRepeated(int value)
{
if (table != null)
{
if (table.length > 0)
{
if (table[(table.length-1)] != null)
{
return table[(table.length-1)].k+table[(table.length-1)].j;
}
}
}
return 0;
}
}
And the resulting bytecodes: The testSeparate
method uses 41 instructions:
int testSeparate(int);
Code:
0: aload_0
1: getfield #15 // Field table:[Lstackoverflow/Node;
4: astore_2
5: aload_2
6: ifnull 40
9: aload_2
10: arraylength
11: istore_3
12: iload_3
13: ifle 40
16: aload_2
17: iload_3
18: iconst_1
19: isub
20: aaload
21: astore 4
23: aload 4
25: ifnull 40
28: aload 4
30: getfield #37 // Field stackoverflow/Node.k:I
33: aload 4
35: getfield #41 // Field stackoverflow/Node.j:I
38: iadd
39: ireturn
40: iconst_0
41: ireturn
The testInlined
method indeed is a tad smaller, with 39 instructions
int testInlined(int);
Code:
0: aload_0
1: getfield #15 // Field table:[Lstackoverflow/Node;
4: dup
5: astore_2
6: ifnull 38
9: aload_2
10: arraylength
11: dup
12: istore 5
14: ifle 38
17: aload_2
18: iload 5
20: iconst_1
21: isub
22: aaload
23: dup
24: astore_3
25: ifnull 38
28: aload_3
29: getfield #37 // Field stackoverflow/Node.k:I
32: aload_3
33: getfield #41 // Field stackoverflow/Node.j:I
36: iadd
37: ireturn
38: iconst_0
39: ireturn
Finally, the testRepeated
method uses a whopping 63 instructions
int testRepeated(int);
Code:
0: aload_0
1: getfield #15 // Field table:[Lstackoverflow/Node;
4: ifnull 62
7: aload_0
8: getfield #15 // Field table:[Lstackoverflow/Node;
11: arraylength
12: ifle 62
15: aload_0
16: getfield #15 // Field table:[Lstackoverflow/Node;
19: aload_0
20: getfield #15 // Field table:[Lstackoverflow/Node;
23: arraylength
24: iconst_1
25: isub
26: aaload
27: ifnull 62
30: aload_0
31: getfield #15 // Field table:[Lstackoverflow/Node;
34: aload_0
35: getfield #15 // Field table:[Lstackoverflow/Node;
38: arraylength
39: iconst_1
40: isub
41: aaload
42: getfield #37 // Field stackoverflow/Node.k:I
45: aload_0
46: getfield #15 // Field table:[Lstackoverflow/Node;
49: aload_0
50: getfield #15 // Field table:[Lstackoverflow/Node;
53: arraylength
54: iconst_1
55: isub
56: aaload
57: getfield #41 // Field stackoverflow/Node.j:I
60: iadd
61: ireturn
62: iconst_0
63: ireturn
So it seems that this "obscure" way of writing the queries and assignments can, indeed, save a few bytes of bytecode, and (given the justification in the linked answer about storing fields in local variables) this may have been the reason to use this style.
But...
in any case: After the method has been executed a few times, the JIT will kick in, and the resulting machine code will have "nothing" to do with the original bytecode - and I'm pretty sure that all three versions will actually be compiled to the same machine code in the end.
So the bottom line is: Don't use this style. Instead, just write dumb code that is easy to read and maintain. You'll know when it's your turn to use "optimizations" like these.
EDIT: A short addendum...
I have made a further test, and compared the testSeparate
and testInlined
method concerning the actual machine code that is generated by the JIT.
I modified the main
method a bit, to prevent unrealistic over-optimizations or other shortcuts that the JIT might take, but the actual methods have been left unmodified.
And as expected: When calling the methods a few thousand times with a hotspot disassembly JVM and -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintAssembly
, then the actual machine code of both methods is identical.
So once more, the JIT does its job pretty well, and the programmer can focus on writing readable code (whatever that means).
... and and a minor correction/clarification:
I did not test the third method, testRepeated
, because it is not equivalent to the other methods (and thus, it can not result in the same machine code). This is, by the way, another small advantage of the strategy of storing fields in local variables: It offers a (very limited, but sometimes handy) form of "thread safety": It is made sure that the length of an array (like the tab
array in the getNode
method of HashMap
) can not change while the method is being executed.