I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
- I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix
= K[I|0]
, where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01], [0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01], [0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
- After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4 projection matrices after rectification. Here P0 denotes the left and P1 denotes the right camera. Tr transforms a point from velodyne coordinates into the left rectified camera coordinate system. In order to map a point X from the velodyne scanner to a point x in the i'th image plane, you thus have to transform it like:
x = Pi * Tr * X