performance of fortran matrix operations

2019-09-05 03:20发布

I need to use Fortran instead of C somewhere and I am very new to Fortran. I am trying to do some big calculations but it is quite slow comparing to C (maybe 10x or more and I am using Intel's compilers for both). I think the reason is Fortran keeps the matrix in column major format, and I am trying to do operations like sum(matrix(i, j, :)), because it is column major, probably this uses the cache very inefficiently (probably not using at all). However, I am not sure if this is the actual reason (since I know so less about Fortran). Question is, the convention in Fortran is to do operations on column vectors instead of row vectors ?

(BTW: I checked the speed of Fortran already using Intel's LAPACK libraries, and it is quite fast, so it is not related to any compiler or build issue.)

Thanks.

Mete

2条回答
We Are One
2楼-- · 2019-09-05 03:32

Try changing the order of your loops when doing matrix operations, e.g. if you have something like this in C:

for (i = 0; i < M; ++i) // for each row
{
    for (j = 0; j < N; ++j) // for each col
    {
        // matrix operations on e.g. A[i][j]
    }
}

then in Fortran you want the j (column) loop as the outer loop and the i (row) loop as the inner loop.

An alternative approach, which achieves the same thing, is to keep the loops as they are but change the definition of the array, e.g. if in C it's A[x][y][z][t] then in FORTRAN make it A[t][z][y][x], assuming that t is the fastest varying loop index, and x the slowest.

查看更多
干净又极端
3楼-- · 2019-09-05 03:51

Since, as you write, Fortran is column major with the first index varying fastest in memory layout, so sum(matrix(i, j, :)) causes the summation of non-contiguous locations. If this is really the cause of slower operation, then you could redefine your matrix to have a different order of dimensions so that the current 3rd dimension is the 1st. Yes, if this is your main computation, rearrange the matrix to make the summation a column operation. Explicit looping should be as earlier indices fastest, as described by @PaulR. If you had previously thought of the optimum index order for C and are changing to Fortran, this is one aspect that might need changing. But while this is theoretically true, I doubt that it really matters that much in practice, unless perhaps the array is enormous. (The worse case would be that part of the array is in RAM and part in swap on disk!) The first rule about run-time speed issues is don't guess ... measure. It is usually the algorithm.

查看更多
登录 后发表回答