I've been reading a lot about this, the more I read the more confused I get.
My understanding: In row-major rows are stored contiguously in memory, in column-major columns are stored contiguously in memory. So if we have a sequence of numbers [1, ..., 9]
and we want to store them in a row-major matrix, we get:
|1, 2, 3|
|4, 5, 6|
|7, 8, 9|
while the column-major (correct me if I'm wrong) is:
|1, 4, 7|
|2, 5, 8|
|3, 6, 9|
which is effectively the transpose of the previous matrix.
My confusion: Well, I don't see any difference. If we iterate on both the matrices (by rows in the first one, and by columns in the second one) we'll cover the same values in the same order: 1, 2, 3, ..., 9
Even matrix multiplication is the same, we take the first contiguous elements and multiply them with the second matrix columns. So say we have the matrix M
:
|1, 0, 4|
|5, 2, 7|
|6, 0, 0|
If we multiply the previous row-major matrix R
with M
, that is R x M
we'll get:
|1*1 + 2*0 + 3*4, 1*5 + 2*2 + 3*7, etc|
|etc.. |
|etc.. |
If we multiply the column-major matrix C
with M
, that is C x M
by taking the columns of C
instead of its rows, we get exactly the same result from R x M
I'm really confused, if everything is the same, why do these two terms even exist? I mean even in the first matrix R
, I could look at the rows and consider them columns...
Am I missing something? What does row-major vs col-major actually imply on my matrix math? I've always learned in my Linear Algebra classes that we multiply rows from the first matrix with columns from the second one, does that change if the first matrix was in column-major? do we now have to multiply its columns with columns from the second matrix like I did in my example or was that just flat out wrong?
Any clarifications are really appreciated!
EDIT: One of the other main sources of confusion I'm having is GLM... So I hover over its matrix type and hit F12 to see how it's implemented, there I see a vector array, so if we have a 3x3 matrix we have an array of 3 vectors. Looking at the type of those vectors I saw 'col_type' so I assumed that each one of those vectors represent a column, and thus we have a column-major system right?
Well, I don't know to be honest. I wrote this print function to compare my translation matrix with glm's, I see the translation vector in glm at the last row, and mine is at the last column...
This adds nothing but more confusion. You can clearly see that each vector in glmTranslate
matrix represents a row in the matrix. So... that means that the matrix is row-major right? What about my matrix? (I'm using a float array[16]) the translation values are in the last column, does that mean my matrix is column-major and I didn't now it? tries to stop head from spinning
A short addendum to above answers. In terms of C, where memory is accessed almost directly, the row-major or column-major order affects your program in 2 ways: 1. It affects the layout of your matrix in memory 2. The order of element access that must be kept - in the form of ordering loops.
eulerworks answer points out that in his example, using row major matrix brought about significant slow down in calculation. Well, he is right, but the result can be at the same time reversed.
The loop order was for(over rows) { for(over columns) { do something over a matrix } }. Which means that the dual loop will access elements in a row and then move over to the next row. For example, A(0,1) -> A(0,2) -> A(0,3) -> ... -> A(0,N_ROWS) -> A(1,0) -> ...
In such case, if A was stored in row major format there would be minimal cache misses since the elements will probably lined up in linear fashion in memory. Otherwise in column-major format, memory access will jump around using N_ROWS as a stride. So row-major is faster in the case.
Now, we can actually switch the loop, such that it will for(over columns) { for(over rows) { do something over a matrix } }. For this case, the result will be exactly the opposite. Column major calculation will be faster since the loop will read elements in columns in linear fashion.
Hence, you might as well remember this: 1. Selecting row major or column major storage format is up to your taste, even though the traditional C programming community seem to prefer the row-major format. 2. Although you are pretty much free to choose whatever you may like, you need to be consistent with the notion of the indexing. 3. Also, this is quite important, keep in mind that when writing down your own algorithms, try to order the loops so that it will honor the storage format of your choice. 4. Be consistent.
Let's look at algebra first; algebra doesn't even have a notion of "memory layout" and stuff.
From an algebraic pov, a MxN real matrix can act on a |R^N vector on its right side and yield a |R^M vector.
Thus, if you were sitting in an exam and given a MxN Matrix and a |R^N vector, you could with trivial operations multiply them and get a result - whether that result is right or wrong will not depend on whether the software your professor uses to check your results internally uses column-major or a row-major layout; it will only depend on if you calculated the contraction of each row of the matrix with the (single) column of the vector properly.
To produce a correct output, the software will - by whatever means - essentially have to contract each row of the Matrix with the column vector, just like you did in the exam.
Thus, the difference between software that aligns column-major and software that uses row-major-layout is not what it calculates, but just how.
To put it more pecisely, the difference between those layouts with regard to the topcial single row's contraction with the column vector is just the means to determine
And thats it.
To show you how that column/row magic is summoned in practice:
You haven't tagged your question with "c++", but because you mentioned 'glm', I assume that you can get along with C++.
In C++'s standard library there's an infamous beast called
valarray
, which, besides other tricky features, has overloads ofoperator
[], one of them can take astd::slice
( which is essentially a very boring thing, consisting of just three integer-type numbers).This little slice thing however, has everything one would need to access a row-major-storage column-wise or a column-major-storage row-wise - it has a start, a length, and a stride - the latter represents the "distance to next bucket" I mentioned.
I think you are mix up an implementation detail with usage, if you will.
Lets start with a two-dimensional array, or matrix:
The problem is that computer memory is a one-dimensional array of bytes. To make our discussion easier, lets group the single bytes into groups of four, thus we have something looking like this, (each single, +-+ represents a byte, four bytes represents an integer value (assuming 32-bit operating systems) :
Another way of representing
So, the question is how to map a two dimensional structure (our matrix) onto this one dimensional structure (i.e. memory). There are two ways of doing this.
Row-major order: In this order we put the first row in memory first, and then the second, and so on. Doing this, we would have in memory the following:
With this method, we can find a given element of our array by performing the following arithmetic. Suppose we want to access the $M_{ij}$ element of the array. If we assume that we have a pointer to the first element of the array, say
ptr
, and know the number of columns saynCol
, we can find any element by:To see how this works, consider M_{02} (i.e. first row, third column -- remember C is zero based.
So we access the third element of the array.
Column-major ordering: In this order we put the first column in memory first, and then the second, and so or. Doing this we would have in memory the following:
SO, the short answer - row-major and column-major format describe how the two (or higher) dimensional arrays are mapped into a one dimensional array of memory.
Hope this helps. T.
Ok, so given that the word "confusion" is literally in the title I can understand the level of...confusion.
Firstly, this absolutely is a real problem
Never, EVER succumb to the idea that "it is used be but...PC's nowadays..."
Of the primary issues here are:
-Cache eviction strategy (LRU, FIFO, etc.) as @Y.C.Jung was beginning to touch on -Branch prediction -Pipelining (it's depth, etc) -Actual physical memory layout -Size of memory -Architecture of machine, (ARM, MIPS, Intel, AMD, Motorola, etc.)
This answer will focus on the Harvard architecture, Von Neumann machine as it is most applicable to the current PC.
The memory hierarchy:
https://en.wikipedia.org/wiki/File:ComputerMemoryHierarchy.svgis
Is a juxtaposition of cost versus speed.
For today's standard PC system this would be something like:
SIZE: 500GB HDD > 8GB RAM > L2 Cache > L1 Cache > Registers. SPEED: 500GB HDD < 8GB RAM < L2 Cache < L1 Cache < Registers.
This leads to the idea of Temporal and Spatial locality. One means how your data is organized, (code, working set, etc.), the other means physically where your data is organized in "memory."
Given that "most" of today's PC's are little-endian (Intel) machines as of late, they lay data into memory in a specific little-endian ordering. It does differ from big-endian, fundamentally.
https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html (covers it rather...
swiftly
;) )(For the simplicity of this example, I am going to 'say' that things happen in single entries, this is incorrect, entire cache blocks are typically accessed and vary drastically my manufacturer, much less model).
So, now that we have that our of the way, if, hypothetically your program demanded
1GB of data from your 500GB HDD
, loaded into your8GB of RAM,
then into thecache
hierarchy, then eventuallyregisters
, where your program went and read the first entry from your freshest cache line just to have your second (in YOUR code) desired entry happen to be sitting in thenext cache line,
(i.e. the next ROW instead of column you would have a cache MISS.Assuming the cache is full, because it is small, upon a miss, according to the eviction scheme, a line would be evicted to make room for the line that 'does' have the next data you need. If this pattern repeated you would have a MISS on EVERY attempted data retrieval!
Worse, you would be evicting lines that actually have valid data you are about to need, so you will have to retrieve them AGAIN and AGAIN.
The term for this is called:
thrashing
https://en.wikipedia.org/wiki/Thrashing_(computer_science) and can indeed crash a poorly written/error prone system. (Think windows BSOD)....
On the other hand, if you had laid out the data properly, (i.e. Row major)...you WOULD still have misses!
But these misses would only occur at the end of each retrieval, not on EVERY attempted retrieval. This results in orders of magnitude of difference in system and program performance.
Very very simple snippet:
Now, compile with:
gcc -g col_maj.c -o col.o
Now, run with:
time ./col.o
real 0m0.009s
user 0m0.003s
sys 0m0.004s
Now repeat for ROW major:
Compile:
terminal4$ gcc -g row_maj.c -o row.o
Run:time ./row.o
real 0m0.005s
user 0m0.001s
sys 0m0.003s
Now, as you can see, the Row Major one was significantly faster.
Not convinced? If you would like to see a more drastic example: Make the matrix 1000000 x 1000000, initialize it, transpose it and print it to stdout. ```
(Note, on a *NIX system you WILL need to set ulimit unlimited)
ISSUES with my answer:
-Optimizing compilers, they change a LOT of things! -Type of system -Please point any others out -This system has an Intel i5 processor
Doesn't matter what you use: just be consistent!
Row major or column major is just a convention. Doesn't matter. C uses row major, Fortran uses column. Both work. Use what's standard in your programming language/environment.
Mismatching the two will !@#$ stuff up
If you use row major addressing on a matrix stored in colum major, you can get the wrong element, read past end of the array, etc...
It's incorrect to say code to do matrix multiplication is the same for row major and column major
(Of course the math of matrix multiplication is the same.) Imagine you have two arrays in memory:
If matrices are stored in column major then X, Y, and X*Y are:
If matrices are stored in row major then X, Y, and X*Y are:
There's nothing deep going on here. It's just two different conventions. It's like measuring in miles or kilometers. Either works, you just can't flip back and forth between the two without converting!
Today there is no reason to use other then column-major order, there are several libraries that support it in c/c++ (eigen,armadillo,...). Furthermore column-major order is more natural, eg. pictures with [x,y,z] are stored slice by slice in file, this is column-major order. While in two dimension it may be confusing to choose better order, in higher dimension it is quite clear that column-major order is the only solution in many situation.
Authors of C created concept of arrays but perhaps they did not expect that somebody had used it as a matrices. I would be shocked myself if I saw how arrays are used in place where already everything was made up in fortran and column-major order. I think that row-major order is simply alternative to column-major order but only in situation where it is really needed (for now I don't know about any).
It is strangely that still someone creates library with row-major order. It is unnecessary waste of energy and time. I hope that one day everything will be column-major order and all confusions simply disappear.