I am trying to implement the (relatively simple) linear homogeneous (DLT) 3D triangulation method from Hartley & Zisserman's "Multiple View Geometry" (sec 12.2), with the aim of implementing their full, "optimal algorithm" in the future. Right now, based on this question, I'm trying to get it to work in Matlab, and will later port it into C++ and OpenCV, testing for conformity along the way.
The problem is that I'm unsure how to use the data I have. I have calibrated my stereo rig, and obtained the two intrinsic camera matrices, two vectors of distortion coefficients, the rotation matrix and translation vector relating the two cameras, as well as the essential and fundamental matrices. I also have the 2D coordinates of two points that are supposed to be correspondences of a single 3D point in the coordinate systems of the two images (taken by the 1st and 2nd camera respectively).
The algorithm takes as input the two point coordinates and two 4x3 "camera matrices" P and P'. These aren't obviously the intrinsic camera matrices (M, M') obtained from the calibration, because for one they are 3x3, and also because projection using them alone puts a 3D point in two distinct coordinate systems, that is - the extrinsic (rotation/translation) data is missing.
The H&Z book contains information (chapter 9) on recovering the required matrices from either the fundamental or the essential matrix using SVD decomposition, but with additional problems of its own (e.g. scale ambiguity). I feel I don't need that, since I have the rotation and translation explicitly defined.
The question then is: would it be correct to use the first intrinsic matrix, with an extra column of zeros as the first "camera matrix" (P=[M|0]), and then multiply the second intrinsic matrix by a extrinsic matrix composed from the rotation matrix and the translation vector as an extra column to obtain the second required "camera matrix" (P'=M'*[R|t])? Or should it be done differently?
Thanks!