I am experimenting with Unsafe to iterate over memory instead of iterating over the values in a byte[]. A memory block is allocated using unsafe. The memory is sufficient to hold 65536 byte values.
I AM TRYING THIS:
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
INSTEAD OF:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...
It was only wishful thinking that the jvm would have a special op (unsafe) that would somehow make regular array access and iteration faster. The jvm, it seems to me, works fine with normal byte[] iterations and does them, fast as can be, using normal, unadulterated, vanilla java code.
@millimoose hits the proverbial 'nail on the head'
"Unsafe might be useful for a lot of things, but this level of microoptimisation isn't one of them. – millimoose"
Using Unsafe is faster in a very strict limited set of circumstances:
- (64bit jvm only) faster for a single 65535 byte[] lookup done exactly once for each test. In this case UnsafeLookup_8B on 64_bit jvm is 24% faster. If the test repeats itself so that each test is done twice, the normal method is now 30% faster than unsafe. In pure interpreted mode on a cold jvm, the Unsafe is faster by far --- but only the first time and only for a small array size. ON a 32 bit standard Oracle JVM 7.x, the normal is three times faster than using unsafe.
Using Unsafe (in my tests) is slower:
- slower on both Oracle java 64 bit and 32 bit virtual machines
- slower regardless of OS and machine architecture (32 and 64 bit)
slower even if
server
jvm option is invokedUnsafe is slower from 9% or more ( 1_GB array and UnsafeLookup_8B(fastest one) in code below on 32 bit jvm (64bit was even slower??))
- Unsafe is slower from 234% or more ( 1_MB array and UnsafeLookup_1B (fastest one) in code below on a 64 bit jvm.
Is there some reason for this?**
When I run the code yellowB posted below (checks a 1GB byte[]), the normal is also still the fastest:
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)
CPU is: Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB (3.25GB usable) OS: Windows 7 (32)
Running the test on an 4 Core AMD64 with Windows 7_64, 32 bit java:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
Running the test on an 4 Core AMD64 with Windows 7_64, 64 bit java:
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
Unsafe methods may be marked as native but that does not mean they are necessarily JNI. Almost all the Unsafe methods are intrinsics (see a short post here: http://psy-lob-saw.blogspot.co.uk/2012/10/java-intrinsics-are-not-jni-calls.html) for the Sun JVM they will be converted to a single assembly instruction(in many cases), for other JVMs they may or may not be as good at dealing with intrinsics and may convert them to JNI calls or plain java calls. From what I know JRockit is tends to go the JNI way, so does the Android JVM.
I think the two functions you posted are basically the same because they read only 1 byte and then convert it to int and do the futher comparing.
Reading 4-Byte int or 8-Byte long every time is much more effective.I wrote two function to do the same thing:compare the content of two byte[] to see if they are the same:
function 1:
function 2:
I ran these two functions on my laptop(corei7 2630QM , 8GB DDR3 , 64bit win 7 , 64bit Hotspot JVM),and compare two 400MB byte[],result is below:
function 1: ~670ms
function 2: ~80ms
2 is much more faster.
So my suggestion is to read 8-byte every time and use the XOR operator(^):
============================================================================
Hi Wilf, I use your code to make a test class as below,this class compare the speed among 3 functions in looking up the 1st 0 in a byte array:
And the output is:
the result shows that even reading 1 byte every time by Unsafe.getByte() is much faster than iterating the byte[] regularly.And reading 8-byte long is the fastest.
One possible reason why the range checking might not be a factor is the JIT compiler's optimizer. Since the array's size never changes, it may be possible for the optimizer to "hoist" all of the range checking and perform it once at the start of the loop.
By contrast, the JIT compiler might be unable to optimize (e.g. inline) the Unsafe.getByte() call. Or maybe the
getByte
method has a read barrier ...)However this is speculation. The way to be sure is to get the JVM to dump out the JIT-compiled native code for the two cases, and compare them instruction by instruction.