This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.
I would expect arrays to be faster than slices in most cases because a slice is a data structure describing a contiguous section of an array and so there may be an extra step involved when accessing elements of a slice (indirectly the elements of its underlying array).
So I wrote a little test to benchmark a batch of simple operations. There are 4 benchmark functions, the first 2 test a global slice and a global array, the other 2 test a local slice and a local array:
var gs = make([]byte, 1000) // Global slice
var ga [1000]byte // Global array
func BenchmarkSliceGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
for j, v := range gs {
gs[j]++; gs[j] = gs[j] + v + 10; gs[j] += v
}
}
}
func BenchmarkArrayGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
for j, v := range ga {
ga[j]++; ga[j] = ga[j] + v + 10; ga[j] += v
}
}
}
func BenchmarkSliceLocal(b *testing.B) {
var s = make([]byte, 1000)
for i := 0; i < b.N; i++ {
for j, v := range s {
s[j]++; s[j] = s[j] + v + 10; s[j] += v
}
}
}
func BenchmarkArrayLocal(b *testing.B) {
var a [1000]byte
for i := 0; i < b.N; i++ {
for j, v := range a {
a[j]++; a[j] = a[j] + v + 10; a[j] += v
}
}
}
I ran the test multiple times, here is the typical output (go test -bench .*
):
BenchmarkSliceGlobal 300000 4210 ns/op
BenchmarkArrayGlobal 300000 4123 ns/op
BenchmarkSliceLocal 500000 3090 ns/op
BenchmarkArrayLocal 500000 3768 ns/op
Analyzing the results:
Accessing the global slice is slightly slower than accessing the global array which is as I expected:
4210
vs 4123
ns/op
But accessing the local slice is significantly faster than accessing the local array:
3090
vs 3768
ns/op
My question is: What is the reason for this?
Notes
I tried varying the following things but none changed the outcome:
- the size of the array/slice (tried 100, 1000, 10000)
- the order of the benchmark functions
- the element type of the array/slice (tried
byte
andint
)
Comparing the amd64 assembly of both
BenchmarkArrayLocal
andBenchmarkSliceLocal
(too long to fit in this post):The array version loads the address of
a
from memory multiple times, practically on every array-access operation:Whereas the slice version is computing exclusively on registers after loading once from memory:
This is not conclusive but probably the cause. Reason being that both methods are otherwise virtually identical. One other notable detail is that the array version calls into
runtime.duffcopy
, which is a quite long assembly routine, whereas the slice version doesn't.Go version 1.8 can eliminate some range checks so the difference got bigger.
BenchmarkSliceGlobal-4 500000 3220 ns/op BenchmarkArrayGlobal-4 1000000 1287 ns/op BenchmarkSliceLocal-4 1000000 1267 ns/op BenchmarkArrayLocal-4 1000000 1301 ns/op
For arrays I'd recommend to use sizes from powers of two and include a logical and operation. In that way you're sure the compiler eliminates the check. Thus
var ga [1024]byte
withga[j & 1023]
.