kitti dataset camera projection matrix

2020-06-28 12:55发布

问题:

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background

  1. I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
       [0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
       [0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
  1. After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?

README:

calib.txt: Calibration data for the cameras: P0/P1 are the 3x4 projection matrices after rectification. Here P0 denotes the left and P1 denotes the right camera. Tr transforms a point from velodyne coordinates into the left rectified camera coordinate system. In order to map a point X from the velodyne scanner to a point x in the i'th image plane, you thus have to transform it like:

  x = Pi * Tr * X

回答1:

Refs:

  1. How to understand the KITTI camera calibration files?
  2. Format of parameters in KITTI's calibration file
  3. http://www.cvlibs.net/publications/Geiger2013IJRR.pdf

Answer:

I strongly recommend you read those references above. They may solve most, if not all, of your questions.

For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:

import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
              [0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
              [0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1]  # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)

You will see some output like:

array([6.018873e+02, 1.831104e+02 ])

which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).



回答2:

For all the P matrices (3x4), they represent:

P(i)rect = [[fu 0  cx  -fu*bx],
            [0  fv  cy -fv*by],
            [0   0   1  0]]

Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.

This post has more details: How Kitti calibration matrix was calculated?