Appending to slice bad performance.. why?

2019-04-26 20:24发布

问题:

I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i < 4; i = i + 1 {
    vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
    vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
    vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
    vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).

When I simply put all 16 elements in one []float32{1..16} then fps is improved by about 4.

Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...

Update: Added github repo https://github.com/Triangle345/GT

回答1:

The builtin append() needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.

Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo value. Even though append() thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.

You may avoid reallocations if you create and initialize vertexInfo like this:

vertexInfo := Opengl.OpenGLVertexInfo{
    Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
    Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
    Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
    Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)).



回答2:

An empty slice is empty. To append, it must allocate memory. And then you do more appends, which have to allocate even more memory.

To speed it up use a fixed size array or use make to create a slice with the correct length, or initialize the slice with the items when you declare it.