I am trying to use MKL Sparse BLAS for CSR matrices with number of rows/columns on the order of 100M. My source code that seems to work fine for 10M rows/columns fails with segfault when I increase it to 100M.
I isolated the problem to the following code snippet:
void TestSegfault1() {
float values[1] = { 1.0f };
int col_indx[1] = { 0 };
int rows_start[1] = { 0 };
int rows_end[1] = { 1 };
// Step 1. Create 1 x 100M matrix
// with single non-zero value at (0,0)
sparse_matrix_t A;
mkl_sparse_s_create_csr(
&A, SPARSE_INDEX_BASE_ZERO, 1, 100000000,
rows_start, rows_end, col_indx, values);
// Step 2. Transpose it to get 100M x 1 matrix
sparse_matrix_t B;
mkl_sparse_convert_csr(A, SPARSE_OPERATION_TRANSPOSE, &B);
}
This function segfaults in mkl_sparse_convert_csr with backtrace
#0 0x00000000004c0d03 in mkl_sparse_s_convert_csr_i4_avx ()
#1 0x0000000000434061 in TestSegfault1 ()
For slightly different code (but essentially the same) it has a little more detail:
#0 0x00000000008fc09b in mkl_serv_free ()
#1 0x000000000099949e in mkl_sparse_s_export_csr_data_i4_avx ()
#2 0x0000000000999ee4 in mkl_sparse_s_convert_csr_i4_avx ()
Apparently something goes bad in memory allocation. And it sure looks like some kind of integer overflow from the outside. The build of MKL I have uses MKL_INT = int = int32.
Is it indeed the case and the limit on number of rows I can have in Sparse BLAS CSR matrix is < 100M (looks more like ~65M)? Or am I doing it wrong?
EDIT 1: MKL version string is "Intel(R) Math Kernel Library Version 11.3.1 Product Build 20151021 for Intel(R) 64 architecture applications".
EDIT 2: Figured it out. There is indeed a subtle kind of integer overflow when allocating memory for internal per-thread buffers. At some point inside mkl_sparse_s_export_csr_data_i4_avx it attempts to allocate (omp_get_max_threads() + 1) * num_rows * 4 bytes; the number doesn't fit in 32-bit signed integer. Subsequent call to mkl_serv_malloc causes memory corruption and eventually segfault. One possible solution is to alter the number of OpenMP threads via omp_set_num_threads call.