I am doing a augmented reality application with 3D objects overlay on top of color video of user. Kinect version 1.7 is used and rendering of virtual objects are done in OpenGL. I have manage to overlay 3D objects on depth video successfully simply by using the intrinsic constants for depth camera from the NuiSensor.h header and compute a projection matrix based on the formula I have found on http://ksimek.github.io/2013/06/03/calibrated_cameras_in_opengl/. The 3D objects rendered with this projection matrix overlay exactly with the 2D skeleton points in depth space. This is not surprising as skeleton 3D points are computed from depth space and gives me confidence that the projection matrix computed outside Kinect SDK works.
Here are some codes for computation of projection matrix from intrinsic constants and how it is used:
glm::mat4 GetOpenGLProjectionMatrixFromCameraIntrinsics(float alpha, float beta, float skew, float u0, float v0,
int img_width, int img_height, float near_clip, float far_clip )
{
float L = 0;
float R = (float)img_width;
float B = 0;
float T = (float)img_height;
float N = near_clip;
float F = far_clip;
glm::mat4 ortho = glm::mat4(0);
glm::mat4 proj = glm::mat4(0);
//Using column major convention
ortho[0][0] = 2.0f/(R-L);
ortho[0][3] = -(R+L)/(R-L);
ortho[1][1] = 2.0f/(T-B);
ortho[1][3] = -(T+B)/(T-B);
ortho[2][2] = -2.0f/(F-N);
ortho[2][3] = -(F+N)/(F-N);
ortho[3][3] = 1;
proj[0][0] = alpha; proj[0][1] = skew; proj[0][2] = -u0;
proj[1][1] = beta; proj[1][2] = -v0;
proj[2][2] = (N+F); proj[2][3] = (N*F);
proj[3][2] = -1;
//since glm is row major, we left multiply the two matrices
//and then transpose the result to pass it to opengl which needs
//the matrix in column major format
return glm::transpose(proj*ortho);
}
//Compute projection matrix of Kinect camera
m_3DProjectionMatrix = GetOpenGLProjectionMatrixFromCameraIntrinsics(m_fx, m_fy, m_skew, m_PPx0, m_PPy0, WIN_WIDTH, WIN_HEIGHT, 0.01f, 10);
//where the input variables are 1142.52, 1142.52, 0.00, 640.00, 480.00, 1280, 960 respectively for m_fx, m_fy, m_skew, m_PPx0, m_PPy0, WIN_WIDTH, WIN_HEIGHT. These numbers are derived from NuiImageCamera.h for depth camera.
Here is how the 2D points are drawn:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, WIN_WIDTH, WIN_HEIGHT, 0, 0.0, 1.0);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
Draw2DSkeletonRGBPoints();//Uses NuiTransformSkeletonToDepthImage() followed by NuiImageGetColorPixelCoordinatesFromDepthPixel()
Draw2DSkeletonDepthPoints();//Uses NuiTransformSkeletonToDepthImage() only
Followed by 3D points:
glMatrixMode(GL_PROJECTION);
glLoadMatrixf(glm::value_ptr(m_3DProjectionMatrix));
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
Draw3DSkeletonPoints();//The Skeleton 3D coordinates from Kinect
However, overlaying virtual objects on top of color video is not that straight away. It seems there are some translation, scaling or even slight rotation between color and depth space. I know there is a SDK function to convert skeleton point to color point but this cannot be used easily for OpenGL rendering; I need a transformation matrix that maps 3D Skeleton points in Skeleton coordinate space into 3D points with Color camera as the origin. Does anyone know how to go about computing this transformation matrix? Where can I find more information about doing this?