When i read about performance in OpenGL/WebGL, i almost hear about reducing the draw calls. So my problem is that i am using only 4 vertices to draw a textured quad. This means generally my vbo contains only 4 vertices. Basically
gl.bindBuffer(gl.ARRAY_BUFFER,vbo);
gl.uniformMatrix4fv(matrixLocation, false, modelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN,0, vertices.length/3);
And here comes the problem i see. Before drawing i update the modelmatrix of the current quad. For example to move it 5 units along the y axis.
So what i have to:
gl.bindBuffer(gl.ARRAY_BUFFER,vbo);
gl.uniformMatrix4fv(matrixLocation, false, modelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN, 0, vertices.length/3);
gl.uniformMatrix4fv(matrixLocation, false, anotherModelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN,0, vertices.length/3);
....// repeat until all textures are rendered
How is it possible for me to reduce the draw calls? Or even reduce it to only one draw call.
The first question is, does it matter?
If you're making less than 1000, maybe even 2000, draw calls it probably doesn't matter. Being easy to use is more important than most other solutions.
If you really need lots of quads then there's a bunch of solutions. One is to put N quads into a single buffer. See this presentation. Then put position, rotation, and scale either into other buffers or into a texture and compute the matrices inside your shader.
In other words, for a textured quad people usually put vertex position and texcoords in buffers ordered like this
Instead you'd do this
p0 - p5 are just unit quad values, p6 - p11 are the same values, p12 - p17 are again the same values. t0 - t5 are unit texcoord values, t6 - t11 are the same texcoord values. etc.
Then you add more buffers. Let's imagine all we want is world position and scale. So we add 2 more buffers
Notice how the scale repeats 6 times, once for each vertex of the first quad. Then it repeats again 6 times for the next quad, etc.. The same with world position. That's so all 6 vertices of a single quad share the same world position and same scale.
Now in the shader we can use those like this
Now the anytime we want to set the position of a quad we need to set the 6 world positions (one for each of the 6 vertices) in the corresponding buffer.
Generally you can update all the world positions, then make 1 call to
gl.bufferData
to upload all of them.Here's 100k quads
You can reduce the number of repeated vertices from 6 to 1 by using the ANGLE_instance_arrays extension. It's not quite as fast as the technique above but it's pretty close.
You can also reduce the amount of data from 6 to 1 by storing the world positions and scale in a texture. In that case instead of the 2 extra buffers you add one extra buffer with just a repeated id
The id repeats 6 times, once for each of the 6 vertices of each quad.
You then use that id to compute a texture coordinate to lookup world position and scale.
Now you need to put your world positions in a texture, you probably want a floating point texture to make it easy. You can do similar things for scale etc and either store each in a separate texture or all in the same texture changing your uv calculation appropriately.
Note that at least on my machine doing it through a texture is slower than doing it through buffers so while it's less work for JavaScript (only one worldPosition to update per quad) it's apparently more work for the GPU (at least on my machine). The buffer version runs at 60fps for me with 100k quads whereas the texture version ran at about 40fps with 100k quads. I lowered it to 50k but of course those numbers are for my machine. Other machines will very.
Techniques like this will allow you to have way more quads but it comes at the expense of flexibility. You can only manipulate them in ways you provided in your shader. For example if you want to be able to scale from different origins (center, top-left, bottom-right, etc) you'd need to add yet another piece of data or set the positions. If you wanted to rotate you'd need to add rotation data, etc...
You could even pass in whole matrices per quad but then you'd be uploading 16 floats per quad. It still might be faster though since you're already doing that when calling
gl.uniformMatrix4fv
but you'd be doing just 2 calls,gl.bufferData
orgl.texImage2D
to upload the new matrices and thengl.drawXXX
to draw.Yet another issue is you mentioned textures. If you're using a different texture per quad then you need to figure out how to convert them to a texture atlas (all the images in one texture) in which case your UV coordinates would not repeat as they do above.