I'm studying Effective Java and in Item 5 of the Book, Joshua Bloch talks about the avoidance of creating unnecessary objects. An example demonstrates mutable Date objects that are never modified once their values have been computed.
Here the 'bad practice':
public Person(Date birthDate) {
this.birthDate = new Date(birthDate.getTime());
}
// DON'T DO THIS!
public boolean isBabyBoomer() {
// Unnecessary allocation of expensive object
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
Date boomStart = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
Date boomEnd = gmtCal.getTime();
return birthDate.compareTo(boomStart) >= 0
&& birthDate.compareTo(boomEnd) < 0;
}
The isBabyBoomer method unnecessarily creates a new Calendar, TimeZone, and two Date instances each time it is invoked - and that clearly makes sense to me.
And here the improved code:
public Person(Date birthDate) {
this.birthDate = new Date(birthDate.getTime());
}
/**
* The starting and ending dates of the baby boom.
*/
private static final Date BOOM_START;
private static final Date BOOM_END;
static {
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_START = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_END = gmtCal.getTime();
}
public boolean isBabyBoomer() {
return birthDate.compareTo(BOOM_START) >= 0
&& birthDate.compareTo(BOOM_END) < 0;
}
Calendar, TimeZone, and Date instances are created only once, when it is initialized.
Bloch explains, that this results in significant performance gains if the method isBabyBoomer()
is invoked frequently.
On his machine:
Bad Version: 32,000 ms for 10 million invocations
Improved Version: 130ms for 10 million invocations
But when I run the examples on my System the performance is exactly the same (14ms).
Is that a compiler feature that the instances are only created once ?
Edit:
Here is my benchmark:
public static void main(String[] args) {
Calendar cal = Calendar.getInstance();
cal.set(1960, Calendar.JANUARY, 1, 1, 1, 0);
Person p = new Person(cal.getTime());
long startTime = System.nanoTime();
for (int i = 0; i < 10000000; i++) {
p.isBabyBoomer();
}
long stopTime = System.nanoTime();
long elapsedTime = stopTime - startTime;
double mseconds = (double) elapsedTime / 1000000.0;
System.out.println(mseconds);
}
Cheers, Markus
Your benchmark is wrong. With the newest Java 7 and a proper warmup I get a dramatic difference between the two methods:
Person::main: estimatedSeconds 1 = '8,42'
Person::main: estimatedSeconds 2 = '0,01'
Here is the full runnable code:
import java.util.Calendar;
import java.util.Date;
import java.util.TimeZone;
public class Person {
private Date birthDate;
static Date BOOM_START;
static Date BOOM_END;
public Person(Date birthDate) {
this.birthDate = new Date(birthDate.getTime());
}
static {
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_START = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_END = gmtCal.getTime();
}
public boolean isBabyBoomerWrong() {
// Unnecessary allocation of expensive object
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
Date boomStart = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
Date boomEnd = gmtCal.getTime();
return birthDate.compareTo(boomStart) >= 0
&& birthDate.compareTo(boomEnd) < 0;
}
public boolean isBabyBoomer() {
return birthDate.compareTo(BOOM_START) >= 0
&& birthDate.compareTo(BOOM_END) < 0;
}
public static void main(String[] args) {
Person p = new Person(new Date());
for (int i = 0; i < 10_000_000; i++) {
p.isBabyBoomerWrong();
p.isBabyBoomer();
}
long startTime = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
p.isBabyBoomerWrong();
}
double estimatedSeconds = (System.nanoTime() - startTime) / 1000000000.0;
System.out.println(String.format("Person::main: estimatedSeconds 1 = '%.2f'", estimatedSeconds));
startTime = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
p.isBabyBoomer();
}
estimatedSeconds = (System.nanoTime() - startTime) / 1000000000.0;
System.out.println(String.format("Person::main: estimatedSeconds 2 = '%.2f'", estimatedSeconds));
}
}
Your question turned out to be just another case of a wrong microbenchmark.
However, in some special cases (mostly with simple data-holding classes), there really is a JVM optimization that discards most of object instantiations. You might want to look at the links below.
The methods described there are obviously not applicable in your case, but it might make the difference in some other strange cases where object instantiation just doesn't seem to taky any time. So remember this for when you actually come across the working example of your question:
- Scalar replacement: Automatic stack allocation in the java virtual machine (stefankrause.net)
- Allocation is faster than you think, and getting faster (ibm.com/developerworks).
The most relevant part:
Typical defensive copying approach to returning a compound value
(don't really worry about the code, it's just that a Point
will be
instantiated and accessed to via getter methods when the
getDistanceFrom()
method is invoked):
public class Point {
private int x, y;
public Point(int x, int y) {
this.x = x; this.y = y;
}
public Point(Point p) { this(p.x, p.y); }
public int getX() { return x; }
public int getY() { return y; }
}
public class Component {
private Point location;
public Point getLocation() { return new Point(location); }
public double getDistanceFrom(Component other) {
Point otherLocation = other.getLocation();
int deltaX = otherLocation.getX() - location.getX();
int deltaY = otherLocation.getY() - location.getY();
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
}
The getLocation()
method does not know what its caller is going to
do with the Point
it returns; it might retain a reference to it,
such as putting it in a collection, so getLocation()
is coded
defensively. However in this example, getDistanceFrom()
is not going
to do this; it is just going to use the Point
for a short time and
then discard it, which seems like a waste of a perfectly good object.
A smart JVM can see what is going on and optimize away the allocation
of the defensive copy. First, the call to getLocation()
will be
inlined, as will the calls to getX()
and getY()
, resulting in
getDistanceFrom()
effectively behaving like this:
(Pseudocode describing the result of applying inlining optimizations
to getDistanceFrom()
)
public double getDistanceFrom(Component other) {
Point otherLocation = new Point(other.x, other.y);
int deltaX = otherLocation.x - location.x;
int deltaY = otherLocation.y - location.y;
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
At this point, escape analysis can show that the object allocated in
the first line never escapes from its basic block and that
getDistanceFrom()
never modifies the state of the other component.
(By escape, we mean that a reference to it is not stored into the heap
or passed to unknown code that might retain a copy.) Given that the
Point
is truly thread-local and its lifetime is known to be bounded
by the basic block in which it is allocated, it can be either
stack-allocated or optimized away entirely, as shown in here:
Pseudocode describing the result of optimizing away allocation in
getDistanceFrom()
:
public double getDistanceFrom(Component other) {
int tempX = other.x, tempY = other.y;
int deltaX = tempX - location.x;
int deltaY = tempY - location.y;
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
The result is that we get exactly the same performance as we would if
all the fields were public while retaining the safety that
encapsulation and defensive copying (among other safe coding
techniques) give us.