MKL Sparse BLAS segfault when transposing CSR with

I am trying to use MKL Sparse BLAS for CSR matrices with number of rows/columns on the order of 100M. My source code that seems to work fine for 10M rows/columns fails with segfault when I increase it to 100M.

I isolated the problem to the following code snippet:

void TestSegfault1() {
    float values[1] = { 1.0f };
    int col_indx[1] = { 0 };
    int rows_start[1] = { 0 };
    int rows_end[1] = { 1 };

    // Step 1. Create 1 x 100M matrix 
    // with single non-zero value at (0,0)
    sparse_matrix_t A;
    mkl_sparse_s_create_csr(
        &A, SPARSE_INDEX_BASE_ZERO, 1, 100000000, 
        rows_start, rows_end, col_indx, values);

    // Step 2. Transpose it to get 100M x 1 matrix
    sparse_matrix_t B;
    mkl_sparse_convert_csr(A, SPARSE_OPERATION_TRANSPOSE, &B);
}

This function segfaults in mkl_sparse_convert_csr with backtrace

#0  0x00000000004c0d03 in mkl_sparse_s_convert_csr_i4_avx ()
#1  0x0000000000434061 in TestSegfault1 ()

For slightly different code (but essentially the same) it has a little more detail:

#0  0x00000000008fc09b in mkl_serv_free ()
#1  0x000000000099949e in mkl_sparse_s_export_csr_data_i4_avx ()
#2  0x0000000000999ee4 in mkl_sparse_s_convert_csr_i4_avx ()

Apparently something goes bad in memory allocation. And it sure looks like some kind of integer overflow from the outside. The build of MKL I have uses MKL_INT = int = int32.

Is it indeed the case and the limit on number of rows I can have in Sparse BLAS CSR matrix is < 100M (looks more like ~65M)? Or am I doing it wrong?

EDIT 1: MKL version string is "Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for Intel(R) 64 architecture applications".

EDIT 2: Figured it out. There is indeed a subtle kind of integer overflow when allocating memory for internal per-thread buffers. At some point inside mkl_sparse_s_export_csr_data_i4_avx it attempts to allocate (omp_get_max_threads() + 1) * num_rows * 4 bytes; the number doesn't fit in 32-bit signed integer. Subsequent call to mkl_serv_malloc causes memory corruption and eventually segfault. One possible solution is to alter the number of OpenMP threads via omp_set_num_threads call.

回答1:

Could you check your example on last version of MKL? I run it on MKL 11.3.2 and it passed correctly for 100M matrix. However it could depend on number of threads on your machine (size of matrix mult number of threads have to be less than max int). To prevent such issue I am strongly recommend to use ilp64 version of MKL libraries Thanks, Alex

回答2:

check how this example works with the latest mkl 2019 u4. compiling the example with ILP64 mode like as follows:

icc   -I/opt/intel/compilers_and_libraries_2019/linux/mkl/include test_csr.cpp   \
-L/opt/intel/compilers_and_libraries_2019/linux/mkl/lib/intel64 -lmkl_core -lmkl_intel_ilp64 -lmkl_intel_thread -liomp5 -lpthread -lm -ldl

./a.out mkl_sparse_convert_csr passed