Assume I have a 2x2 matrix filled with values which will represent a plane. Now I want to rotate the plane around itself in a 3-D way, in the "z-Direction". For a better understanding, see the following image:
I wondered if this is possible by a simple affine matrix, thus I created the following simple script:
%Create a random value matrix
A = rand*ones(200,200);
%Make a box in the image
A(50:200-50,50:200-50) = 1;
Now I can apply transformations in the 2-D room simply by a rotation matrix like this:
R = affine2d([1 0 0; .5 1 0; 0 0 1])
tform = affine3d(R);
transformed = imwarp(A,tform);
However, this will not produce the desired output above, and I am not quite sure how to create the 2-D affine matrix to create such behavior.
I guess that a 3-D affine matrix can do the trick. However, if I define a 3-D affine matrix I cannot work with the 2-D representation of the matrix anymore, since MATLAB will throw the error:
The number of dimensions of the input image A must be 3 when the
specified geometric transformation is 3-D.
So how can I code the desired output with an affine matrix?
You can perform a projective transformation that can be estimated using the position of the corners in the first and second image.
This is the result of the code above:
Original image:
Transformed image:
The answer from m3tho correctly addresses how you would apply the transformation you want: using
fitgeotrans
with a'projective'
transform, thus requiring that you specify 4 control points (i.e. 4 pairs of corresponding points in the input and output image). You can then apply this transform usingimwarp
.The issue, then, is how you select these pairs of points to create your desired transformation, which in this case is to create a perspective projection. As shown below, a perspective projection takes into account that a viewing position (i.e. "camera") will have a given view angle defining a conic field of view. The scene is rendered by taking all 3-D points within this cone and projecting them onto the viewing plane, which is the plane located at the camera target which is perpendicular to the line joining the camera and its target.
Let's first assume that your image is lying in the viewing plane and that the corners are described by a normalized reference frame such that they span
[-1 1]
in each direction. We need to first select the degree of perspective we want by choosing a view angle and then computing the distance between the camera and the viewing plane. A view angle of around 45 degrees can mimic the sense of perspective of normal human sight, so using the corners of the viewing plane to define the edge of the conic field of view, we can compute the camera distance as follows:Now we can use this to generate a set of control points for the transformation. We first apply a 3-D rotation to the corner points of the viewing plane, rotating around the y axis by an amount
theta
. This rotates them out of plane, so we now project the corner points back onto the viewing plane by defining a line from the camera through each rotated corner point and finding the point where it intersects the plane. I'm going to spare you the mathematical derivations (you can implement them yourself from the formulas in the above links), but in this case everything simplifies down to the following set of calculations:And
outP
now contains your normalized set of control points in the output image. Given an image of sizes
, we can create a set of input and output control points as follows:And you can apply the transformation like so:
The only issue you may come across is that a rotation of 90 degrees (i.e. looking end-on at the image plane) would create a set of collinear points that would cause
fitgeotrans
to error out. In such a case, you would technically just want a blank image, because you can't see a 2-D object when looking at it edge-on.Here's some code illustrating the above transformations by animating a spinning image:
And here's the animation: