The preferred way of converting []byte
to string
is this:
var b []byte
// fill b
s := string(b)
In this code byte slice is copied, which can be a problem in situations where performance is important.
When performance is critical, one can consider performing the unsafe conversion:
var b []byte
// fill b
s := *(*string)(unsafe.Pointer(&b))
My question is: what can go wrong when using the unsafe conversion? I known that string
should be immutable and if we change b
, s
will also be changed. And still: so what? Is it all bad that can happen?
Modifying something that the language spec guarantees to be immutable is an act of treason.
Since the spec guarantees that string
s are immutable, compilers are allowed to generate code that caches their values and does other optimization based on this. You can't change values of string
s in any normal way, and if you resort to dirty ways (like package unsafe
) to still do it, you lose all the guarantees provided by the spec, and by continuing to use the modified string
s, you may bump into "bugs" and unexpected things randomly.
For example if you use a string
as a key in a map and you change the string
after you put it into the map, you might not be able to find the associated value in the map using either the original or the modified value of the string
(this is implementation dependent).
To demonstrate this, see this example:
m := map[string]int{}
b := []byte("hi")
s := *(*string)(unsafe.Pointer(&b))
m[s] = 999
fmt.Println("Before:", m)
b[0] = 'b'
fmt.Println("After:", m)
fmt.Println("But it's there:", m[s], m["bi"])
for i := 0; i < 1000; i++ {
m[strconv.Itoa(i)] = i
}
fmt.Println("Now it's GONE:", m[s], m["bi"])
for k, v := range m {
if k == "bi" {
fmt.Println("But still there, just in a different bucket: ", k, v)
}
}
Output (try it on the Go Playground):
Before: map[hi:999]
After: map[bi:<nil>]
But it's there: 999 999
Now it's GONE: 0 0
But still there, just in a different bucket: bi 999
At first, we just see some weird result: simple Println()
is not able to find its value. It sees something (key is found), but value is displayed as nil
which is not even a valid value for the value type int
(zero value for int
is 0
).
If we grow the map to be big (we add 1000 elements), internal data structure of the map gets restructured. After this, we're not even able to find our value by explicitly asking for it with the appropriate key. It is still in the map as iterating over all its key-value pairs we find it, but since hash code changes as the value of the string
changes, most likely it is searched for in a different bucket than where it is (or where it should be).
Also note that code using package unsafe
may work as you expect it now, but the same code might work completely differently (meaning it may break) with a future (or old) version of Go as "packages that import unsafe may be non-portable and are not protected by the Go 1 compatibility guidelines".
Also you may run into unexpected errors as the modified string
might be used in different ways. Someone might just copy the string header, someone may copy its content. See this example:
b := []byte{'h', 'i'}
s := *(*string)(unsafe.Pointer(&b))
s2 := s // Copy string header
s3 := string([]byte(s)) // New string header but same content
fmt.Println(s, s2, s3)
b[0] = 'b'
fmt.Println(s == s2)
fmt.Println(s == s3)
We created 2 new local variables s2
and s3
using s
, s2
initialized by copying the string header of s
, and s3
is initialized with a new string
value (new string header) but with the same content. Now if you modify the original s
, you would expect in a correct program that comparing the new strings to the original you would get the same result be it either true
or false
(based on if values were cached, but should be the same).
But the output is (try it on the Go Playground):
hi hi hi
true
false