Camera pose estimation (OpenCV PnP)

2019-01-13 22:59发布

I am trying to get a global pose estimate from an image of four fiducials with known global positions using my webcam.

I have checked many stackexchange questions and a few papers and I cannot seem to get a a correct solution. The position numbers I do get out are repeatable but in no way linearly proportional to camera movement. FYI I am using C++ OpenCV 2.1.

At this link is pictured my coordinate systems and the test data used below.

% Input to solvePnP():
imagePoints =     [ 481, 831; % [x, y] format
                    520, 504;
                   1114, 828;
                   1106, 507]
objectPoints = [0.11, 1.15, 0; % [x, y, z] format
                0.11, 1.37, 0; 
                0.40, 1.15, 0;
                0.40, 1.37, 0]

% camera intrinsics for Logitech C910
cameraMat = [1913.71011, 0.00000,    1311.03556;
             0.00000,    1909.60756, 953.81594;
             0.00000,    0.00000,    1.00000]
distCoeffs = [0, 0, 0, 0, 0]

% output of solvePnP():
tVec = [-0.3515;
         0.8928; 
         0.1997]

rVec = [2.5279;
       -0.09793;
        0.2050]
% using Rodrigues to convert back to rotation matrix:

rMat = [0.9853, -0.1159,  0.1248;
       -0.0242, -0.8206, -0.5708;
        0.1686,  0.5594, -0.8114]

So far, can anyone see anything wrong with these numbers? I would appreciate it if someone would check them in for example MatLAB (code above is m-file friendly).

From this point, I am unsure of how to get the global pose from rMat and tVec. From what I have read in this question, to get the pose from rMat and tVec is simply:

position = transpose(rMat) * tVec   % matrix multiplication

However I suspect from other sources that I have read it is not that simple.

To get the position of the camera in real world coordinates, what do I need to do? As I am unsure if this is an implementation problem (however most likely a theory problem) I would like for someone who has used the solvePnP function successfully in OpenCV to answer this question, although any ideas are welcome too!

Thank you very much for your time.

3条回答
爷、活的狠高调
2楼-- · 2019-01-13 23:27

If you mean with global pose a 4x4 camera pose matrix, which can be used in OpenGL, I do it this way

CvMat* ToOpenGLCos( const CvMat* tVec, const CvMat* rVec )
{
    //** flip COS 180 degree around x-axis **//

    // Rodrigues to rotation matrix
    CvMat* extRotAsMatrix = cvCreateMat(3,3,CV_32FC1);
    cvRodrigues2(rVec,extRotAsMatrix);

    // Simply merge rotation matrix and translation vector to 4x4 matrix 
    CvMat* world2CameraTransformation = CreateTransformationMatrixH(tVec,
    extRotAsMatrix );

    // Create correction rotation matrix (180 deg x-axis)
    CvMat* correctionMatrix = cvCreateMat(4,4,CV_32FC1);
    /* 1.00000   0.00000   0.00000   0.00000
       0.00000  -1.00000  -0.00000   0.00000
       0.00000   0.00000  -1.00000   0.00000
       0.00000   0.00000   0.00000   1.00000 */
    cvmSet(correctionMatrix,0,0,1.0); cvmSet(correctionMatrix,0,1,0.0);
    ... 

    // Flip it
    CvMat* world2CameraTransformationOpenGL = cvCreateMat(4,4,CV_32FC1);
    cvMatMul(correctionMatrix,world2CameraTransformation,   world2CameraTransformationOpenGL);

    CvMat* camera2WorldTransformationOpenGL = cvCreateMat(4,4,CV_32FC1);
    cvInv(world2CameraTransformationOpenGL,camera2WorldTransformationOpenGL,
    CV_LU );

    cvReleaseMat( &world2CameraTransformationOpenGL );
    ...

    return camera2WorldTransformationOpenGL;
}

I think flipping the coordinate system is necessary, because OpenCV and OpenGL/VTK/etc. use different coordinate systems, as illustrated in this picture OpenGL and OpenCV Coordinate Systems

Well, it works this way but somebody might have a better explanation.

查看更多
戒情不戒烟
3楼-- · 2019-01-13 23:39

I solved this a while ago, apologies for the year delay.

In the python OpenCV 2.1 I was using, and the newer version 3.0.0-dev, I have verified that to get the pose of the camera in the global frame you must:

_, rVec, tVec = cv2.solvePnP(objectPoints, imagePoints, cameraMatrix, distCoeffs)
Rt = cv2.Rodrigues(rvec)
R = Rt.transpose()
pos = -R * tVec

Now pos is the position of the camera expressed in the global frame (the same frame the objectPoints are expressed in). R is an attitude matrix DCM which is a good form to store the attitude in. If you require Euler angles then you can convert the DCM to Euler angles given an XYZ rotation sequence using:

roll = atan2(-R[2][1], R[2][2])
pitch = asin(R[2][0])
yaw = atan2(-R[1][0], R[0][0])
查看更多
Melony?
4楼-- · 2019-01-13 23:40

position of camera would be {- transpose( r ) * t } . That's it.

And you have done everything correctly except , cv::solvePnp gives (4 x 1) vector for translation if I remember right , you would have to divide x , y , z with the w co-ordinate.

查看更多
登录 后发表回答