I am trying to get a global pose estimate from an image of four fiducials with known global positions using my webcam.
I have checked many stackexchange questions and a few papers and I cannot seem to get a a correct solution. The position numbers I do get out are repeatable but in no way linearly proportional to camera movement. FYI I am using C++ OpenCV 2.1.
At this link is pictured my coordinate systems and the test data used below.
% Input to solvePnP():
imagePoints = [ 481, 831; % [x, y] format
520, 504;
1114, 828;
1106, 507]
objectPoints = [0.11, 1.15, 0; % [x, y, z] format
0.11, 1.37, 0;
0.40, 1.15, 0;
0.40, 1.37, 0]
% camera intrinsics for Logitech C910
cameraMat = [1913.71011, 0.00000, 1311.03556;
0.00000, 1909.60756, 953.81594;
0.00000, 0.00000, 1.00000]
distCoeffs = [0, 0, 0, 0, 0]
% output of solvePnP():
tVec = [-0.3515;
0.8928;
0.1997]
rVec = [2.5279;
-0.09793;
0.2050]
% using Rodrigues to convert back to rotation matrix:
rMat = [0.9853, -0.1159, 0.1248;
-0.0242, -0.8206, -0.5708;
0.1686, 0.5594, -0.8114]
So far, can anyone see anything wrong with these numbers? I would appreciate it if someone would check them in for example MatLAB (code above is m-file friendly).
From this point, I am unsure of how to get the global pose from rMat and tVec.
From what I have read in this question, to get the pose from rMat and tVec is simply:
position = transpose(rMat) * tVec % matrix multiplication
However I suspect from other sources that I have read it is not that simple.
To get the position of the camera in real world coordinates, what do I need to do?
As I am unsure if this is an implementation problem (however most likely a theory problem) I would like for someone who has used the solvePnP function successfully in OpenCV to answer this question, although any ideas are welcome too!
Thank you very much for your time.
I solved this a while ago, apologies for the year delay.
In the python OpenCV 2.1 I was using, and the newer version 3.0.0-dev, I have verified that to get the pose of the camera in the global frame you must:
_, rVec, tVec = cv2.solvePnP(objectPoints, imagePoints, cameraMatrix, distCoeffs)
Rt = cv2.Rodrigues(rvec)
R = Rt.transpose()
pos = -R * tVec
Now pos is the position of the camera expressed in the global frame (the same frame the objectPoints are expressed in).
R is an attitude matrix DCM which is a good form to store the attitude in.
If you require Euler angles then you can convert the DCM to Euler angles given an XYZ rotation sequence using:
roll = atan2(-R[2][1], R[2][2])
pitch = asin(R[2][0])
yaw = atan2(-R[1][0], R[0][0])
If you mean with global pose a 4x4 camera pose matrix, which can be used in OpenGL, I do it this way
CvMat* ToOpenGLCos( const CvMat* tVec, const CvMat* rVec )
{
//** flip COS 180 degree around x-axis **//
// Rodrigues to rotation matrix
CvMat* extRotAsMatrix = cvCreateMat(3,3,CV_32FC1);
cvRodrigues2(rVec,extRotAsMatrix);
// Simply merge rotation matrix and translation vector to 4x4 matrix
CvMat* world2CameraTransformation = CreateTransformationMatrixH(tVec,
extRotAsMatrix );
// Create correction rotation matrix (180 deg x-axis)
CvMat* correctionMatrix = cvCreateMat(4,4,CV_32FC1);
/* 1.00000 0.00000 0.00000 0.00000
0.00000 -1.00000 -0.00000 0.00000
0.00000 0.00000 -1.00000 0.00000
0.00000 0.00000 0.00000 1.00000 */
cvmSet(correctionMatrix,0,0,1.0); cvmSet(correctionMatrix,0,1,0.0);
...
// Flip it
CvMat* world2CameraTransformationOpenGL = cvCreateMat(4,4,CV_32FC1);
cvMatMul(correctionMatrix,world2CameraTransformation, world2CameraTransformationOpenGL);
CvMat* camera2WorldTransformationOpenGL = cvCreateMat(4,4,CV_32FC1);
cvInv(world2CameraTransformationOpenGL,camera2WorldTransformationOpenGL,
CV_LU );
cvReleaseMat( &world2CameraTransformationOpenGL );
...
return camera2WorldTransformationOpenGL;
}
I think flipping the coordinate system is necessary, because OpenCV and OpenGL/VTK/etc. use different coordinate systems, as illustrated in this picture OpenGL and OpenCV Coordinate Systems
Well, it works this way but somebody might have a better explanation.
position of camera would be {- transpose( r ) * t } . That's it.
And you have done everything correctly except , cv::solvePnp gives (4 x 1) vector for translation if I remember right , you would have to divide x , y , z with the w co-ordinate.