I am a Go beginner and stuck with a problem.
I want to encode a string with UTF16 little endian and then hash it with MD5 (hexadecimal). I have found a piece of Python code, which does exactly what I want. But I am not able to transfer it to Google Go.
md5 = hashlib.md5()
md5.update(challenge.encode('utf-16le'))
response = md5.hexdigest()
The challenge is a variable containing a string.
You can do it with less work (or at least more understandability, IMO) by using golang.org/x/text/encoding and golang.org/x/text/transform to create a Writer chain that will do the encoding and hashing without so much manual byte slice handling. The equivalent function:
func utf16leMd5(s string) []byte {
enc := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewEncoder()
hasher := md5.New()
t := transform.NewWriter(hasher, enc)
t.Write([]byte(s))
return hasher.Sum(nil)
}
You can use the unicode/utf16
package for UTF-16 encoding. utf16.Encode()
returns the UTF-16 encoding of the Unicode code point sequence (slice of runes: []rune
). You can simply convert a string
to a slice of runes, e.g. []rune("some string")
, and you can easily produce the byte sequence of the little-endian encoding by ranging over the uint16
codes and sending/appending first the low byte then the high byte to the output (this is what Little Endian means).
For Little Endian encoding, alternatively you can use the encoding/binary
package: it has an exported LittleEndian
variable and it has a PutUint16()
method.
As for the MD5 checksum, the crypto/md5
package has what you want, md5.Sum()
simply returns the MD5 checksum of the byte slice passed to it.
Here's a little function that captures what you want to do:
func utf16leMd5(s string) [16]byte {
codes := utf16.Encode([]rune(s))
b := make([]byte, len(codes)*2)
for i, r := range codes {
b[i*2] = byte(r)
b[i*2+1] = byte(r >> 8)
}
return md5.Sum(b)
}
Using it:
s := "Hello, playground"
fmt.Printf("%x\n", utf16leMd5(s))
s = "エヌガミ"
fmt.Printf("%x\n", utf16leMd5(s))
Output:
8f4a54c6ac7b88936e990256cc9d335b
5f0db9e9859fd27f750eb1a212ad6212
Try it on the Go Playground.
The variant that uses encoding/binary
would look like this:
for i, r := range codes {
binary.LittleEndian.PutUint16(b[i*2:], r)
}
(Although this is slower as it creates lots of new slice headers.)
So, for reference, I used this complete python program:
import hashlib
import codecs
md5 = hashlib.md5()
md5.update(codecs.encode('Hello, playground', 'utf-16le'))
response = md5.hexdigest()
print response
It prints 8f4a54c6ac7b88936e990256cc9d335b
Here is the Go equivalent: https://play.golang.org/p/Nbzz1dCSGI
package main
import (
"crypto/md5"
"encoding/binary"
"encoding/hex"
"fmt"
"unicode/utf16"
)
func main() {
s := "Hello, playground"
fmt.Println(md5Utf16le(s))
}
func md5Utf16le(s string) string {
encoded := utf16.Encode([]rune(s))
b := convertUTF16ToLittleEndianBytes(encoded)
return md5Hexadecimal(b)
}
func md5Hexadecimal(b []byte) string {
h := md5.New()
h.Write(b)
return hex.EncodeToString(h.Sum(nil))
}
func convertUTF16ToLittleEndianBytes(u []uint16) []byte {
b := make([]byte, 2*len(u))
for index, value := range u {
binary.LittleEndian.PutUint16(b[index*2:], value)
}
return b
}