Firstly, if you would like an explanation of the GLM lookAt algorithm, please look at the answer provided on this question: https://stackoverflow.com/a/19740748/1525061
mat4x4 lookAt(vec3 const & eye, vec3 const & center, vec3 const & up)
{
vec3 f = normalize(center - eye);
vec3 u = normalize(up);
vec3 s = normalize(cross(f, u));
u = cross(s, f);
mat4x4 Result(1);
Result[0][0] = s.x;
Result[1][0] = s.y;
Result[2][0] = s.z;
Result[0][1] = u.x;
Result[1][1] = u.y;
Result[2][1] = u.z;
Result[0][2] =-f.x;
Result[1][2] =-f.y;
Result[2][2] =-f.z;
Result[3][0] =-dot(s, eye);
Result[3][1] =-dot(u, eye);
Result[3][2] = dot(f, eye);
return Result;
}
Now I'm going to tell you why I seem to be having a conceptual issue with this algorithm. There are two parts to this view matrix, the translation and the rotation. The translation does the correct inverse transformation, bringing the camera position to the origin, instead of the origin position to the camera. Similarly, you expect the rotation that the camera defines to be inversed before being put into this view matrix as well. I can't see that happening here, that's my issue.
Consider the forward vector, this is where your camera looks at. Consequently, this forward vector needs to be mapped to the -Z axis, which is the forward direction used by openGL. The way this view matrix is suppose to work is by creating an orthonormal basis in the columns of the view matrix, so when you multiply a vertex on the right hand side of this matrix, you are essentially just converting it's coordinates to that of different axes.
When I play the rotation that occurs as a result of this transformation in my mind, I see a rotation that is not the inverse rotation of the camera, like what's suppose to happen, rather I see the non-inverse. That is, instead of finding the camera forward being mapped to the -Z axis, I find the -Z axis being mapped to the camera forward.
If you don't understand what I mean, consider a 2D example of the same type of thing that is happening here. Let's say the forward vector is (sqr(2)/2 , sqr(2)/2), or sin/cos of 45 degrees, and let's also say a side vector for this 2D camera is sin/cos of -45 degrees. We want to map this forward vector to (0,1), the positive Y axis. The positive Y axis can be thought of as the analogy to the -Z axis in openGL space. Let's consider a vertex in the same direction as our forward vector, namely (1,1). By using the logic of GLM.lookAt, we should be able to map (1,1) to the Y axis by using a 2x2 matrix that consists of the forward vector in the first column and the side vector in the second column. This is an equivalent calculation of that calculation http://www.wolframalpha.com/input/?i=%28sqr%282%29%2F2+%2C+sqr%282%29%2F2%29++1+%2B+%28sqr%282%29%2F2%2C+-sqr%282%29%2F2+%29+1.
Note that you don't get your (1,1) vertex mapped the positive Y axis like you wanted, instead you have it mapped to the positive X axis. You might also consider what happened to a vertex that was on the positive Y axis if you applied this transformation. Sure enough, it is transformed to the forward vector.
Therefore it seems like something very fishy is going on with the GLM algorithm. However, I doubt this algorithm is incorrect since it is so popular. What am I missing?
Have a look at GLU source code in Mesa: http://cgit.freedesktop.org/mesa/glu/tree/src/libutil/project.c
First in the implementation of gluPerspective, notice the -1
is using the indices [2][3]
and the -2 * zNear * zFar / (zFar - zNear)
is using [3][2]
. This implies that the indexing is [column][row]
.
Now in the implementation of gluLookAt
, the first row is set to side
, the next one to up
and the final one to -forward
. This gives you the rotation matrix which is post-multiplied by the translation that brings the eye to the origin.
GLM seems to be using the same [column][row]
indexing (from the code). And the piece you just posted for lookAt
is consistent with the more standard gluLookAt
(including the translational part). So at least GLM and GLU agree.
Let's then derive the full construction step by step. Noting C
the center position and E
the eye position.
Move the whole scene to put the eye position at the origin, i.e. apply a translation of -E
.
Rotate the scene to align the axes of the camera with the standard (x, y, z)
axes.
2.1 Compute a positive orthonormal basis for the camera:
f = normalize(C - E) (pointing towards the center)
s = normalize(f x u) (pointing to the right side of the eye)
u = s x f (pointing up)
with this, (s, u, -f)
is a positive orthonormal basis for the camera.
2.2 Find the rotation matrix R
that aligns maps the (s, u, -f)
axes to the standard ones (x, y, z)
. The inverse rotation matrix R^-1
does the opposite and aligns the standard axes to the camera ones, which by definition means that:
(sx ux -fx)
R^-1 = (sy uy -fy)
(sz uz -fz)
Since R^-1 = R^T
, we have:
( sx sy sz)
R = ( ux uy uz)
(-fx -fy -fz)
Combine the translation with the rotation. A point M
is mapped by the "look at" transform to R (M - E) = R M - R E = R M + t
. So the final 4x4 transform matrix for "look at" is indeed:
( sx sy sz tx ) ( sx sy sz -s.E )
L = ( ux uy uz ty ) = ( ux uy uz -u.E )
(-fx -fy -fz tz ) (-fx -fy -fz f.E )
( 0 0 0 1 ) ( 0 0 0 1 )
So when you write:
That is, instead of finding the camera forward being mapped to the -Z
axis, I find the -Z axis being mapped to the camera forward.
it is very surprising, because by construction, the "look at" transform maps the camera forward axis to the -z axis. This "look at" transform should be thought as moving the whole scene to align the camera with the standard origin/axes, it's really what it does.
Using your 2D example:
By using the logic of GLM.lookAt, we should be able to map (1,1) to the Y
axis by using a 2x2 matrix that consists of the forward vector in the
first column and the side vector in the second column.
That's the opposite, following the construction I described, you need a 2x2 matrix with the forward and row vector as rows and not columns to map (1, 1) and the other vector to the y and x axes. To use the definition of the matrix coefficients, you need to have the images of the standard basis vectors by your transform. This gives directly the columns of the matrix. But since what you are looking for is the opposite (mapping your vectors to the standard basis vectors), you have to invert the transformation (transpose, since it's a rotation). And your reference vectors then become rows and not columns.
These guys might give some further insights to your fishy issue:
glm::lookAt vertical camera flips when z <= 0
The answer might be of interest to you?