I wanted to port a state-of-the-art hash function MeiYan from C to Go. (As far as I know this is one of the best if not just the best hash function for hash tables in terms of speed and collision rate, it beats MurMur at least.)
I am new to Go, just spent one weekend with it, and came up with this version:
func meiyan(key *byte, count int) uint32 {
type P *uint32;
var h uint32 = 0x811c9dc5;
for ;count >= 8; {
a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
}
if (count & 4) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 2) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 1) != 0 {
h = (h ^ uint32(*key));
h = h * 0xad3e7
}
return h ^ (h >> 16);
}
Looks messy, but I do not think I can make it look better. Now I measure the speed and it is frustratingly slow, 3 times slower than C/C++ when compiled with gccgo -O3
. Can this be made faster? Is this just as good as compiler can make it or unsafe.Pointer
conversion is just as slow as it gets? In fact this surprised me, because I have seen that some other number crunching style code was just as fast as C or even faster. Am I doing something inneficiently here?
Here is the original C code I am porting from:
u32 meiyan(const char *key, int count) {
typedef u32* P;
u32 h = 0x811c9dc5;
while (count >= 8) {
h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
count -= 8;
key += 8;
}
#define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
if (count & 4) { tmp tmp }
if (count & 2) { tmp }
if (count & 1) { h = (h ^ *key) * 0xad3e7; }
#undef tmp
return h ^ (h >> 16);
}
Here is how I measure speed:
func main(){
T := time.Now().UnixNano()/1e6
buf := []byte("Hello World!")
var controlSum uint64 = 0
for x := 123; x < 1e8; x++ {
controlSum += uint64(meiyan(&buf[0], 12))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
fmt.Println("controlSum:", controlSum)
}
After some careful research I found out why my code was slow, and improved it so it is now faster than the C version in my tests:
The hash function itself was already fast, but dereferencing the array on each iteration is what made it slow:
&buf[0]
was replaced withstart := &buf[0]
and then usestart
on each iteration.The implementation from NATS looks impressive! On my machine, for a data of length 30 (bytes) op/sec 157175656.56 and nano-sec/op 6.36! Take a look at it. You might find some ideas.