I am trying to estimate the pose of a camera by scanning two images taken from it, detecting features in the images, matching them, creating the fundamental matrix, using the camera intrinsics to calculate the essential matrix and then decompose it to find the Rotation and Translation.
Here is the matlab code:
I1 = rgb2gray(imread('1.png'));
I2 = rgb2gray(imread('2.png'));
points1 = detectSURFFeatures(I1);
points2 = detectSURFFeatures(I2);
points1 = points1.selectStrongest(40);
points2 = points2.selectStrongest(40);
[features1, valid_points1] = extractFeatures(I1, points1);
[features2, valid_points2] = extractFeatures(I2, points2);
indexPairs = matchFeatures(features1, features2);
matchedPoints1 = valid_points1(indexPairs(:, 1), :);
matchedPoints2 = valid_points2(indexPairs(:, 2), :);
F = estimateFundamentalMatrix(matchedPoints1,matchedPoints2);
K = [2755.30930612600,0,0;0,2757.82356074384,0;1652.43432833339,1234.09417974414,1];
%figure; showMatchedFeatures(I1, I2, matchedPoints1, matchedPoints2);
E = transpose(K)*F*K;
W = [0,-1,0;1,0,0;0,0,1];
Z = [0,1,0;-1,0,0;0,0,0];
[U,S,V] = svd(E);
R = U*inv(W)*transpose(V);
T = U(:,3);
thetaX = radtodeg(atan2(R(3,2),R(3,3)));
thetaY = radtodeg(atan2(-R(3,1),sqrt(R(3,2)^2 +R(3,3)^2)));
thetaZ = radtodeg(atan2(R(2,1),R(1,1)));
The problem I am facing is that R and T are always incorrect. ThetaZ is most of the times equal to ~90, If I repeat the calculation a lot of times I sometimes get the expected angles. (Only in some cases though)
I dont seem to understand why. It might be because the Fundamental Matrix I calculated is wrong. Or is there a different spot where I am going wrong?
Also what scale/units is T in? (Translation Vector) Or is it inferred differently.
P.S. New to computer vision...