What's the purpose of magic 4 of last row in m

2019-04-30 06:09发布

When I read the book about WebGL, I've seen the next matrix description:

enter image description here

There is an information about the last row in book (WebGL Beginner's Guide Beginner's Guide Diego Cantor, Brandon Jones):

The mysterious fourth row The fourth row does not bear any special meaning. Elements m4, m8, m12 are always zero. Element m 16 (the homogeneous coordinate) will always be 1.

So, if the last row is always [ 0, 0, 0, 1 ], I don't understand the next:

Why is it necessary be strictly [ 0, 0, 0, 1 ], why not just all the values be 0 or even some other value?

But, if to view the source code of glMatrix javascript library, exactly the method translate() from the mat4 https://github.com/toji/gl-matrix/blob/master/src/gl-matrix/mat4.js

You're able to see the next:

/**
 * Translate a mat4 by the given vector not using SIMD
 *
 * @param {mat4} out the receiving matrix
 * @param {mat4} a the matrix to translate
 * @param {vec3} v vector to translate by
 * @returns {mat4} out
 */
mat4.scalar.translate = function (out, a, v) {
    var x = v[0], y = v[1], z = v[2],
        a00, a01, a02, a03,
        a10, a11, a12, a13,
        a20, a21, a22, a23;

    if (a === out) {
        out[12] = a[0] * x + a[4] * y + a[8] * z + a[12];
        out[13] = a[1] * x + a[5] * y + a[9] * z + a[13];
        out[14] = a[2] * x + a[6] * y + a[10] * z + a[14];
        out[15] = a[3] * x + a[7] * y + a[11] * z + a[15];
    } else {
        a00 = a[0]; a01 = a[1]; a02 = a[2]; a03 = a[3];
        a10 = a[4]; a11 = a[5]; a12 = a[6]; a13 = a[7];
        a20 = a[8]; a21 = a[9]; a22 = a[10]; a23 = a[11];

        out[0] = a00; out[1] = a01; out[2] = a02; out[3] = a03;
        out[4] = a10; out[5] = a11; out[6] = a12; out[7] = a13;
        out[8] = a20; out[9] = a21; out[10] = a22; out[11] = a23;

        out[12] = a00 * x + a10 * y + a20 * z + a[12];
        out[13] = a01 * x + a11 * y + a21 * z + a[13];
        out[14] = a02 * x + a12 * y + a22 * z + a[14];
        out[15] = a03 * x + a13 * y + a23 * z + a[15];
    }

    return out;
};

I shall highlight the line:

out[15] = a03 * x + a13 * y + a23 * z + a[15];

The last one ( the homogeneous coordinate ) is modifying, so it could be not equal 1.0?

So, I rather don't understand...

I see, that internal 3x3 matrix represents rotations and [ m13, m14, m15 ] is a translation vector for the changing the origin position of camera, but what's about the last row and why sometimes I see some calculations on it in libraries?

PS

Also I suppose there is some kind of magic 3 for the 3x3 matrix which is used for the 2D-transformations, am I right?

1条回答
ゆ 、 Hurt°
2楼-- · 2019-04-30 06:15

Lets start with a bit of theory:

In general, all transformations in OpenGL are mappings between different vector spaces. This means that a transformation t takes an element from space V and maps it to it's corresponding element in space W, which can be written as

t: V ---> W

One of the simplest mappings is a linear map, which can (under some assumptions**) always be represented by a matrix. The dimension of the matrix is always given by the dimension of the vector spaces we are working in, thus a mapping from R^N to R^M will always look like this:

t: R^N ---> R^M
t(x) = A * x, A = R^(N,M)

Where A is a N times M dimensional matrix.

In OpenGL, we normally need mappings from R^3 to R^3 which means that linear mappings will always be represented by a 3x3 matrix. Using this, one can express at least rotations, scalings (and combinations of this***). But when looking at (for example) translations, we see that there is no way how they can be represented using a 3x3 matrix, so we have to extend our transformations to also support this operations.

This can be achieved by using affine mappings instead of linear ones, which are defined as

t: R^N ---> R^M
t(x) = A * x + b,  A = R^(N,M) is a linear transformation and  b = R^M

Using this we can express rotations, scalings and transformations from R^3 to R^3 by specifying a 3x3 matrix plus a 3D vector. Since this formulation is not very handy (requires a matrix and a vector, hard to combine multiple transformations), one normally stores the operation in a matrix of dimension N+1, which is called augmented matrix (or augmented vector spaces):

t: R^N ---> R^M

         -A-  b       x
t(x) = [        ] * [   ]
         -0-  1       1

As you can see, the last line of the matrix is always zero, except the rightmost element which is one. This also guarantees, that the last dimension of the result t(x) is always 1.

Why is it necessary be strictly [ 0, 0, 0, 1 ], why not just all the values be 0 or even some other value?

If we wouldn't restrict the last row to be exactly [0,0,0,1], we would not have an augmented affine mapping in R^3 anymore, but a linear mapping in R^4. Since in OpenGL R^4 is not really relevant and we want to keep translations included, the last row is fixed. Another point is, that when the last row is different, combining affine mappings by matrix multiplication would not work.

One problem left is, that we are still not able to express (perspective) projections by using affine mappings. When looking at a perspective projection matrix in OpenGL, one will notice that here the last row is not [0,0,0,1], but the theory behind this is a totally different story (if you are interested have a look here or here).

What's about the last row and why sometimes I see some calculations on it in libraries? The last one ( the homogeneous coordinate ) is modifying, so it could be not equal 1.0?

As already said, the last row is only [0,0,0,1] for affine mappings, not for projective ones. But sometimes it makes sense to apply transformations after a projection (for example moving the projected image on screen), then the last row of the matrix has to be respected. That's why most matrix libraries implement all operations in a way that allows for general matrices. The line

out[15] = a03 * x + a13 * y + a23 * z + a[15];

Will result in 1 as long as the last row (a03, a13, a23, a[15]) equals [0,0,0,1].

Since this post already got a lot longer than I thought, I'll better stop here, but if you have any further questions, just ask and I will try to add something to the answer.

Footnotes:

** Works when both spaces are finite-dimensional vector spaces and a basis is defined for both of them.

*** Combinations, since the combination of linear transformations over a finite-dimensional space is also linear, e.g., t: R^N -> R^M, u: R^M -> R^K, both linear => t(u(x)) linear

查看更多
登录 后发表回答