I have 10-bit YUV (V210) video frames coming in from a capture card, and I would like to unpack this data inside of a GLSL shader and ultimately convert to RGB for screen output. I'm using a Quadro 4000 card on Linux (OpenGL 4.3).
I am uploading the texture with the following settings:
video frame: 720x486 pixels
physically occupies 933120 bytes in 128-byte aligned memory (stride of 1920)
texture is currently uploaded as 480x486 pixels (stride/4 x height) since this matches the byte count of the data
internalFormat of GL_RGB10_A2
format of GL_RGBA
type of GL_UNSIGNED_INT_2_10_10_10_REV
filtering is currently set to GL_NEAREST
Here is the upload command for clarity:
int stride = ((m_videoWidth + 47) / 48) * 128;
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB10_A2, stride / 4, m_videoHeight, 0, GL_RGBA, GL_UNSIGNED_INT_2_10_10_10_REV, bytes);
The data itself is packed like so:
U Y V A | Y U Y A | V Y U A | Y V Y A
Or see Blackmagic's illustration here: http://i.imgur.com/PtXBJbS.png
Each texel is 32-bits total (10 bits each for "R,G,B" channels and 2 bits for alpha). Where it gets complicated is that 6 pixels are packed into this block of 128 bits. These blocks simply repeat the above pattern until the end of the frame.
I know that the components of each texel can be accessed with texture2D(tex, coord).rgb but since the order is not the same for every texel (e.g. UYV vs YUY), I know that the texture coordinates must be manipulated to account for that.
However, I'm not sure how to deal with the fact that there are simply more pixels packed into this texture than the GL knows about, which I believe means that I have to account for scaling up/down as well as min/mag filtering (I need bilinear) internally in my shader. The output window needs to be able to be any size (smaller, same or larger than the texture) so the shader should not have any constants related to that.
How can I accomplish this?
Here is the completed shader with all channels and RGB conversion (no filtering is performed however):
If the image is going to be scaled up on the screen then you will likely want to do bilinear filtering, but this will need to be performed within the shader.
I reccommend to first write a shader that only does the pixel reordering, and keep interpolation out.
It does require extra video RAM, and another render pass, but it is not neccesarily slower: If you include rescaling, you need to compute the contents of 4 intermediate pixels and then interpolate between them. If you make a separate shader for the interpolation, it can be as simple as returning the result of a single texture lookup with hardware interpolation.
Once you have a correct shader for the color sample rearrangement, you can always turn it into a function.
So then how to write the rearrangement shader?
Your inputs look like this:
Let's for simplicity assume you only want to read Y. Then you can make a very simple 1d texture (720 columns x 1 rows). Each texture cell will have two values: the column offset where to read the value from. Second we need the location of the Y sample within that cell:
To get the Y(Luminance) value, you index the row texture with screen x position. Then you know which source texel to read. Then take the second component, and use it to get the correct sample. In DirectX, you can simply index a
vec4/float4
with a integer to select the R/G/B/A value. I hope GLSL supports the same.So now you have Y. Repeat the above process for U and V.
Once you get it working, you can try to optimize by smartly packing the above information more effectively in a single texture, rather than three different ones. Or, you can try to think of a linear function, that after rounding produces the column indexes. That will save you many texture lookups.
But maybe the whole optimization is moot point in your scenario. Just get the simplest case working first.
I purposely didn't write the shader code for you as I am mostly familiar with DirectX. But this should get you started.
Good luck!