I'm curious about this.
I wanted to check which function was faster, so I create a little code and I executed a lot of times.
public static void main(String[] args) {
long ts;
String c = "sgfrt34tdfg34";
ts = System.currentTimeMillis();
for (int k = 0; k < 10000000; k++) {
c.getBytes();
}
System.out.println("t1->" + (System.currentTimeMillis() - ts));
ts = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
Bytes.toBytes(c);
}
System.out.println("t2->" + (System.currentTimeMillis() - ts));
}
The "second" loop is faster, so, I thought that Bytes class from hadoop was faster than the function from String class. Then, I changed the order of the loops and then c.getBytes() got faster. I executed many times, and my conclusion was, I don't know why, but something happen in my VM after the first code execute so that the results become faster for the second loop.
This is a classic java benchmarking issue. Hotspot/JIT/etc will compile your code as you use it, so it gets faster during the run.
Run around the loop at least 3000 times (10000 on a server or on 64 bit) first - then do your measurements.
When you execute a method at least 10000 times, it triggers the whole method to be compiled. This means that your second loop can be
The best solution is to place each loop in a separate method so one loop doesn't optimise the other AND run this a few times, ignoring the first run.
e.g.
Few more observations
As pointed by @dasblinkenlight above, Hadoop's
Bytes.toBytes(c);
internally calls theString.getBytes("UTF-8")
The variant method
String.getBytes()
which takes Character Set as input is faster than the one that does not take any character set. So for a given string,getBytes("UTF-8")
would be faster thangetBytes()
. I have tested this on my machine (Windows8, JDK 7). Run the two loops one withgetBytes("UTF-8")
and other withgetBytes()
in sequence in equal iterations.this gives:
and the results are same even if you change order of executions of loop. To discount any JIT optimizations, I would suggest run the tests in separate methods to confirm this (as suggested by @Peter Lawrey above)
Bytes.toBytes(c)
should always be faster thanString.getBytes()
Most likely, the code was still compiling or not yet compiled at the time the first loop ran.
Wrap the entire method in an outer loop so you can run the benchmarks a few times, and you should see more stable results.
Read: Dynamic compilation and performance measurement.
It simply might be the case that you allocate so much space for objects with your calls to getBytes(), that the JVM Garbage Collector starts and cleans up the unused references (bringing out the trash).
You know there's something wrong, because
Bytes.toBytes
callsc.getBytes
internally:The source is taken from here. This tells you that the call cannot possibly be faster than the direct call - at the very best (i.e. if it gets inlined) it would have the same timing. Generally, though, you'd expect it to be a little slower, because of the small overhead in calling a function.
This is the classic problem with micro-benchmarking in interpreted, garbage-collected environments with components that run at arbitrary time, such as garbage collectors. On top of that, there are hardware optimizations, such as caching, that skew the picture. As the result, the best way to see what is going on is often to look at the source.