Fastest way to create a sparse matrix of the form

I'm trying to optimize a piece of code that solves a large sparse nonlinear system using an interior point method. During the update step, this involves computing the Hessian matrix H, the gradient g, then solving for d in H * d = -g to get the new search direction.

The Hessian matrix has a symmetric tridiagonal structure of the form:

A.T * diag(b) * A + C

I've run line_profiler on the particular function in question:

Line # Hits     Time  Per Hit % Time Line Contents
==================================================
   386                               def _direction(n, res, M, Hsig, scale_var, grad_lnprior, z, fac):
   387                               
   388                                   # gradient
   389   44  1241715  28220.8    3.7     g = 2 * scale_var * res - grad_lnprior + z * np.dot(M.T, 1. / n)
   390                               
   391                                   # hessian
   392   44  3103117  70525.4    9.3     N = sparse.diags(1. / n ** 2, 0, format=FMT, dtype=DTYPE)
   393   44 18814307 427597.9   56.2     H = - Hsig - z * np.dot(M.T, np.dot(N, M))    # slow!
   394                                   
   395                                   # update direction
   396   44 10329556 234762.6   30.8     d, fac = my_solver(H, -g, fac)
   397                                   
   398   44      111      2.5    0.0     return d, fac

Looking at the output it's clear that constructing H is by far the most costly step - it takes considerably longer than actually solving for the new direction.

Hsig and M are both CSC sparse matrices, n is a dense vector and z is a scalar. The solver I'm using requires H to be either a CSC or CSR sparse matrix.

Here's a function that produces some toy data with the same formats, dimensions and sparseness as my real matrices:

import numpy as np
from scipy import sparse

def make_toy_data(nt=200000, nc=10):

    d0 = np.random.randn(nc * (nt - 1))
    d1 = np.random.randn(nc * (nt - 1))
    M = sparse.diags((d0, d1), (0, nc), shape=(nc * (nt - 1), nc * nt),
                     format='csc', dtype=np.float64)

    d0 = np.random.randn(nc * nt)
    Hsig = sparse.diags(d0, 0, shape=(nc * nt, nc * nt), format='csc',
                        dtype=np.float64)

    n = np.random.randn(nc * (nt - 1))
    z = np.random.randn()

    return Hsig, M, n, z

And here's my original approach for constructing H:

def original(Hsig, M, n, z):
    N = sparse.diags(1. / n ** 2, 0, format='csc')
    H = - Hsig - z * np.dot(M.T, np.dot(N, M))    # slow!
    return H

Timing:

%timeit original(Hsig, M, n, z)
# 1 loops, best of 3: 483 ms per loop

Is there a faster way to construct this matrix?

标签： numpy scipy linear-algebra sparse-matrix hessian-matrix

2条回答

Rolldiameter

2楼-- · 2019-07-19 16:32

I tried running your test case and had problems with the np.dot(N, M). I didn't dig into it, but I think my numpy/sparse combo (both pretty new) had problems using np.dot on sparse arrays.

But H = -Hsig - z*M.T.dot(N.dot(M)) runs just fine. This uses the sparse dot.

I haven't run a profile, but here are Ipython timings for several parts. It takes longer to generate the data than to do that double dot.

In [37]: timeit Hsig,M,n,z=make_toy_data()
1 loops, best of 3: 2 s per loop

In [38]: timeit N = sparse.diags(1. / n ** 2, 0, format='csc')
1 loops, best of 3: 377 ms per loop

In [39]: timeit H = -Hsig - z*M.T.dot(N.dot(M))
1 loops, best of 3: 1.55 s per loop

H is a

<2000000x2000000 sparse matrix of type '<type 'numpy.float64'>'
    with 5999980 stored elements in Compressed Sparse Column format>

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-07-19 16:56

I get close to a 4x speed-up in computing the product M.T * D * M out of the three diagonal arrays. If d0 and d1 are the main and upper diagonal of M, and d is the main diagonal of D, then the following code creates M.T * D * M directly:

def make_tridi_bis(d0, d1, d, nc=10):
    d00 = d0*d0*d
    d11 = d1*d1*d
    d01 = d0*d1*d
    len_ = d0.size
    data = np.empty((3*len_ + nc,))
    indices = np.empty((3*len_ + nc,), dtype=np.int)
    # Fill main diagonal
    data[:2*nc:2] = d00[:nc]
    indices[:2*nc:2] = np.arange(nc)
    data[2*nc+1:-2*nc:3] = d00[nc:] + d11[:-nc]
    indices[2*nc+1:-2*nc:3] = np.arange(nc, len_)
    data[-2*nc+1::2] = d11[-nc:]
    indices[-2*nc+1::2] = np.arange(len_, len_ + nc)
    # Fill top diagonal
    data[1:2*nc:2] = d01[:nc]
    indices[1:2*nc:2] = np.arange(nc, 2*nc)
    data[2*nc+2:-2*nc:3] = d01[nc:]
    indices[2*nc+2:-2*nc:3] = np.arange(2*nc, len_+nc)
    # Fill bottom diagonal
    data[2*nc:-2*nc:3] = d01[:-nc]
    indices[2*nc:-2*nc:3] = np.arange(len_ - nc)
    data[-2*nc::2] = d01[-nc:]
    indices[-2*nc::2] = np.arange(len_ - nc ,len_)

    indptr = np.empty((len_ + nc + 1,), dtype=np.int)
    indptr[0] = 0
    indptr[1:nc+1] = 2
    indptr[nc+1:len_+1] = 3
    indptr[-nc:] = 2
    np.cumsum(indptr, out=indptr)

    return sparse.csr_matrix((data, indices, indptr), shape=(len_+nc, len_+nc))

If your matrix M were in CSR format, you can extract d0 and d1 as d0 = M.data[::2] and d1 = M.data[1::2], I modified you toy data making routine to return those arrays as well, and here's what I get:

In [90]: np.allclose((M.T * sparse.diags(d, 0) * M).A, make_tridi_bis(d0, d1, d).A)
Out[90]: True

In [92]: %timeit make_tridi_bis(d0, d1, d)
10 loops, best of 3: 124 ms per loop

In [93]: %timeit M.T * sparse.diags(d, 0) * M
1 loops, best of 3: 501 ms per loop

The whole purpose of the above code is to take advantage of the structure of the non-zero entries. If you draw a diagram of the matrices you are multiplying together, it is relatively easy to convince yourself that the main (d_0) and top and bottom (d_1) diagonals of the resulting tridiagonal matrix are simply:

d_0 = np.zeros((len_ + nc,))
d_0[:len_] = d00
d_0[-len_:] += d11

d_1 = d01

The rest of the code in that function is simply building the tridiagonal matrix directly, as calling sparse.diags with the above data is several times slower.

0人赞添加讨论(0) 举报

Fastest way to create a sparse matrix of the form

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间