I accessed calibration files from part odometry of KITTI, wherein contents of one calibration file are as follows:
P0: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 0.000000000000e+00 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.861448000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 4.538225000000e+01 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 -1.130887000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 3.779761000000e-03
P3: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.372877000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 2.369057000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 4.915215000000e-03
Tr: 4.276802385584e-04 -9.999672484946e-01 -8.084491683471e-03 -1.198459927713e-02 -7.210626507497e-03 8.081198471645e-03 -9.999413164504e-01 -5.403984729748e-02 9.999738645903e-01 4.859485810390e-04 -7.206933692422e-03 -2.921968648686e-01
I can get that P0, P1 represent monochrome camera and P2, P3 color camera. To my understanding, common shape of camera intrinsic is
fx 0 cx
0 fy cy
0 0 1 .
So I cannot figure out the meaning of remaining three parameter (I guess used for distortion rectification) in each line and the last line following label Tr
.
A similar question can be found from this post, but answers to it are still unobvious to me. Can anyone help me out?
In those files,
P1
,P0
etc are not camera intrinsics but projection matrices, defined by something likewhich means that they are of size
3*4
. I'm not exactly sure whether the corresponding definition is the one above or if you'll have to use (well, it's an equivalent definition, more or less...)but I think that it's easy to check this by just reading/displaying the data.
As for
Tr
, it is the concatenation of all camera positions. You have four camerasP0, ..., P3
, andTr
has12
elements, so the first three are the translation ofP0
, the next three are the translation ofP1
and so on. What I'm not sure about is whether each of those are expressed asT_i
or-R_i.transpose()*T_i
. I think the safest way is try to check this by playing with the data.As for why there are four cameras
P0, ...,P3
, to quote their paper:Here, i ∈ { 0 , 1 , 2 , 3 } is the camera index, where 0 represents the left grayscale, 1 the right grayscale, 2 the left color and 3 the right color camera.
I think that also explains why their projection matrices are close to each other.