C++ Auto-Vectorize Matrix Multiplication loop

2019-02-20 03:43发布

问题:

When compiling my source code which does basic matrix-matrix multiplication with auto-vectorization and auto-parallelization enabled, I receive these warnings in console:

C5002: loop not vectorized due to reason '1200'
C5012: loop not parallelized due to reason'1000'

I've read through this resource provided by MSDN which states:

Reason code 1200: Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.

Reason code 1000: The compiler detected a data dependency in the loop body.

I'm not sure what in my loop is causing problems. Here is the relevant portion of my source code.

// int** A, int** B, int** result, const int dimension
for (int i = 0; i < dimension; ++i) {
    for (int j = 0; j < dimension; ++j) {
        for (int k = 0; k < dimension; ++k) {
            result[i][j] = result[i][j] + A[i][k] * B[k][j];
        }   
    }
}

Any insight would be greatly appreciated.

回答1:

The loop carried dependence is on result[i][j].

A solution to your problem would be using a temporary variable when summing up the result and do the update outside the inner-most loop like this:

for (int i = 0; i < dimension; ++i) {
    for (int j = 0; j < dimension; ++j) {
        auto tmp = 0;
        for (int k = 0; k < dimension; ++k) {
            tmp += A[i][k] * B[k][j];
        }
        result[i][j] = tmp;
    }
}

This is going remove the dependence (since there is more read-after-write of result[i][j] and should help the vectorizer doing a better job.