Format of parameters in KITTI's calibration fi

2019-09-19 13:44发布

问题:

I accessed calibration files from part odometry of KITTI, wherein contents of one calibration file are as follows:

P0: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 0.000000000000e+00 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.861448000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 4.538225000000e+01 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 -1.130887000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 3.779761000000e-03
P3: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.372877000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 2.369057000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 4.915215000000e-03
Tr: 4.276802385584e-04 -9.999672484946e-01 -8.084491683471e-03 -1.198459927713e-02 -7.210626507497e-03 8.081198471645e-03 -9.999413164504e-01 -5.403984729748e-02 9.999738645903e-01 4.859485810390e-04 -7.206933692422e-03 -2.921968648686e-01

I can get that P0, P1 represent monochrome camera and P2, P3 color camera. To my understanding, common shape of camera intrinsic is

fx 0  cx
0  fy cy
0  0  1 .

So I cannot figure out the meaning of remaining three parameter (I guess used for distortion rectification) in each line and the last line following label Tr.

A similar question can be found from this post, but answers to it are still unobvious to me. Can anyone help me out?

回答1:

In those files, P1, P0 etc are not camera intrinsics but projection matrices, defined by something like

P1=calibration_matrix * [R_1 | T_1]  

which means that they are of size 3*4. I'm not exactly sure whether the corresponding definition is the one above or if you'll have to use (well, it's an equivalent definition, more or less...)

P1=calibration_matrix*[R_1.transpose() | -R_1.transpose()*T_1] 

but I think that it's easy to check this by just reading/displaying the data.

As for Tr, it is the concatenation of all camera positions. You have four cameras P0, ..., P3, and Tr has 12 elements, so the first three are the translation of P0, the next three are the translation of P1 and so on. What I'm not sure about is whether each of those are expressed as T_i or -R_i.transpose()*T_i. I think the safest way is try to check this by playing with the data.

As for why there are four cameras P0, ...,P3, to quote their paper:

Here, i ∈ { 0 , 1 , 2 , 3 } is the camera index, where 0 represents the left grayscale, 1 the right grayscale, 2 the left color and 3 the right color camera.

I think that also explains why their projection matrices are close to each other.