Cache Friendly Vertex Definition

2019-05-17 17:53发布

I am writing an opengl application and for vertices, normals, and colors, I am using separate buffers as follows:

GLuint vertex_buffer, normal_buffer, color_buffer;

My supervisor tells me that if I define an struct like:

struct vertex {
    glm::vec3 pos;
    glm::vec3 normal;
    glm::vec3 color;
};
GLuint vertex_buffer;

and then define a buffer of these vertices, my application will gets so much faster because when the position is cached the normals and colors will be in cache line.

What I think is that defining such struct is not having that much affect on the performance because defining the vertex like the struct will cause less vertices in the cacheline while defining them as separate buffers, will cause to have 3 different cache lines for positions, normals and colors in the cache. So, nothing has been changed. Is that true?

3条回答
Emotional °昔
2楼-- · 2019-05-17 18:28

Tell your supervisor "premature optimization is the root of all evil" – Donald E. Knuth. But don't forget the next sentence "but that doesn't mean we shouldn't optimize hot spots".

So did you actually profile the differences?

Anyway, the layout of your vertex data is not critical for caching efficiency on modern GPUs. It used to be on old GPUs (ca. 2000), which is why there were functions for interleaving vertex data. But these days it's pretty much a non-issue.

That has to do with the way modern GPUs access memory and in fact modern GPUs' cache lines are not index by memory address, but by access pattern (i.e. the first distinct memory access in a shader gets the first cache line, the second one the second cache line, and so on).

查看更多
该账号已被封号
3楼-- · 2019-05-17 18:30

Depends on the GPU architecture.

Most GPUs will have multiple cache lines (some for uniforms, others for vertex attributes, others for texture sampling)

Also when the vertex shader is nearly done the GPU can pre-fetch the next set of attributes into the cache. So that by the time the vertex shader is done the next attributes are right there ready to be loaded into the registers.

tl;dr don't bother with these "rule of thumbs" unless you actually profile it or know the actual architecture of the GPU.

查看更多
别忘想泡老子
4楼-- · 2019-05-17 18:34

First of all, using separate buffers for different vertex attributes may not be a good technique.

Very important factor here is GPU architecture. Most (especially modern) GPUs have multiple cache lines (data for Input Assembler stage, uniforms, textures), but fetching input attributes from multiple VBOs can be inefficient anyway (always profile!). Defining them in interleaved format can help improve performance:

enter image description here

And that's what you would get, if you used such struct.

However, that's not always true (again, always profile!) - although interleaved data is more GPU-friendly, it needs to be properly aligned and can take significantly more space in memory.

But, in general:

Interleaved data formats:

  • Cause less GPU cache pressure, because the vertex coordinate and attributes of a single vertex aren't scattered all over in memory. They fit consecutively into few cache lines, whereas scattered attributes could cause more cache updates and therefore evictions. The worst case scenario could be one (attribute) element per cache line at a time because of distant memory locations, while vertices get pulled in a non-deterministic/non-contiguous manner, where possibly no prediction and prefetching kicks in. GPUs are very similar to CPUs in this matter.

  • Are also very useful for various external formats, which satisfy the deprecated interleaved formats, where datasets of compatible data sources can be read straight into mapped GPU memory. I ended up re-implementing these interleaved formats with the current API for exactly those reasons.

  • Should be layouted alignment friendly just like simple arrays. Mixing various data types with different size/alignment requirements may need padding to be GPU and CPU friendly. This is the only downside I know of, appart from the more difficult implementation.

  • Do not prevent you from pointing to single attrib arrays in them for sharing.

Source

Further reads:

Best Practices for Working with Vertex Data

Vertex Specification Best Practices

查看更多
登录 后发表回答