I am writing an opengl application and for vertices, normals, and colors, I am using separate buffers as follows:
GLuint vertex_buffer, normal_buffer, color_buffer;
My supervisor tells me that if I define an struct like:
struct vertex {
glm::vec3 pos;
glm::vec3 normal;
glm::vec3 color;
};
GLuint vertex_buffer;
and then define a buffer of these vertices, my application will gets so much faster because when the position is cached the normals and colors will be in cache line.
What I think is that defining such struct is not having that much affect on the performance because defining the vertex like the struct will cause less vertices in the cacheline while defining them as separate buffers, will cause to have 3 different cache lines for positions, normals and colors in the cache. So, nothing has been changed. Is that true?
Tell your supervisor "premature optimization is the root of all evil" – Donald E. Knuth. But don't forget the next sentence "but that doesn't mean we shouldn't optimize hot spots".
So did you actually profile the differences?
Anyway, the layout of your vertex data is not critical for caching efficiency on modern GPUs. It used to be on old GPUs (ca. 2000), which is why there were functions for interleaving vertex data. But these days it's pretty much a non-issue.
That has to do with the way modern GPUs access memory and in fact modern GPUs' cache lines are not index by memory address, but by access pattern (i.e. the first distinct memory access in a shader gets the first cache line, the second one the second cache line, and so on).
Depends on the GPU architecture.
Most GPUs will have multiple cache lines (some for uniforms, others for vertex attributes, others for texture sampling)
Also when the vertex shader is nearly done the GPU can pre-fetch the next set of attributes into the cache. So that by the time the vertex shader is done the next attributes are right there ready to be loaded into the registers.
tl;dr don't bother with these "rule of thumbs" unless you actually profile it or know the actual architecture of the GPU.
First of all, using separate buffers for different vertex attributes may not be a good technique.
Very important factor here is GPU architecture. Most (especially modern) GPUs have multiple cache lines (data for Input Assembler stage, uniforms, textures), but fetching input attributes from multiple VBOs can be inefficient anyway (always profile!). Defining them in interleaved format can help improve performance:
And that's what you would get, if you used such struct.
However, that's not always true (again, always profile!) - although interleaved data is more GPU-friendly, it needs to be properly aligned and can take significantly more space in memory.
But, in general:
Source
Further reads:
Best Practices for Working with Vertex Data
Vertex Specification Best Practices