How to accelerate slow matrix multiplication in Sy

2019-02-19 08:01发布

I was writing a tool to solve specific recurrence equations with SymPy, and found that one of the steps that involved a matrix multiplication was taking an extraordinarily long time. For instance, if I try the following in the iPython console,

In [1]: from sympy import *

In [2]: A = Matrix(500, 500, lambda i,j: 2 + abs(i-j) if i-j in [-1, 0, 1] else 0)

In [3]: A[0:7, 0:7]
Out[3]: 
Matrix([
[2, 3, 0, 0, 0, 0, 0],
[3, 2, 3, 0, 0, 0, 0],
[0, 3, 2, 3, 0, 0, 0],
[0, 0, 3, 2, 3, 0, 0],
[0, 0, 0, 3, 2, 3, 0],
[0, 0, 0, 0, 3, 2, 3],
[0, 0, 0, 0, 0, 3, 2]])

In [4]: A * A
...

I've never waited long enough for it to finish.

I appreciate that symbolic manipulation considerably slower than numerical evaluation, but this seems pretty ridiculous. Using Octave or other linear algebra packages can perform this calculation in a fraction of a second.

Does anyone have experience using SymPy's matrix class for rows & columns ~ 1000, with Rational entries?

3条回答
Viruses.
2楼-- · 2019-02-19 08:46

Arbitrary-precision as a must, speed as a nice-to-have

There are a few pythonic classes for such purpose, that may interest your arbitrary precision calculi

  • decimal.Decimal()

  • fractions.Fraction( numerator = 0, denominator = 1 )

These are indispensable for precise calculations, be it for large-scale astronomy, DEP-simulations or other fields, where precision must not degrade alongside the computation strategy progress over long scales of time / events / recursion and similar threats to standard number-representations.

Personally I worked with decimal.Decimal() ( for up to 5.000.000 digits precision in crypto-sequence randomness analyses ) but was not focused on having 'em "inside" numpy matrix.

Numpy can bear these types in it's matrix-dataStructures ( dtype = decimal.Decimal syntax et al), however testing is needed, to verify it's vectorised functions operations and the overall speed once handling these lovely pythonic classes' instances.

Performance

As an initial observation on numpy speed on a standard ("dense" sounds funny for this scale ) 2x2 decimal.Decimal-s:

>>> aClk.start();m*m;aClk.stop()
array([[Decimal('0.01524157875323883675019051999'),
        Decimal('5.502209507697009702374335655')],
       [Decimal('1.524157875323883675019051999'),
        Decimal('11.94939027587381419881628113')]], dtype=object)
5732L # 5.7 msec_______________________________________________________________

While the same on dtype = numpy.float96 took

>>> aClk.start();f*f;aClk.stop()
array([[ 0.042788046,  0.74206772],
       [ 0.10081096,  0.46544855]], dtype=float96)
2979L # 2.9 msec_______________________________________________________________

For a 500 x 500 fully-populated dtype = fractions.Fraction

>>> aClk.start();M*M;aClk.stop()
array([[Fraction(9, 64), Fraction(1, 4), Fraction(64, 25), ...,
        Fraction(64, 81), Fraction(16, 81), Fraction(36, 1)],
        ..,
       [Fraction(1, 1), Fraction(9, 4), Fraction(4, 1), ...,
        Fraction(1, 4), Fraction(25, 36), Fraction(1, 1)]], dtype=object)
2692088L # 2.7 sec_<<<_Fraction_______________________________vs. 19 msec float96

For a 500 x 500 fully-populated dense dtype = decimal.Decimal

>>> aClk.start();D*D;aClk.stop()
array([[Decimal('0.140625'), Decimal('0.25'), Decimal('2.56'), ...,
        Decimal('0.7901234567901234567901234568'),
        Decimal('0.1975308641975308641975308642'), Decimal('36')],
       [Decimal('3.24'), Decimal('0.25'), Decimal('0.25'), ...,
        Decimal('0.02040816326530612244897959185'), Decimal('0.04'),
        Decimal('0.1111111111111111111111111111')],
       [Decimal('0.1111111111111111111111111111'), Decimal('0.25'),
        Decimal('2.25'), ..., Decimal('0.5102040816326530612244897959'),
        Decimal('0.25'), Decimal('0.0625')],
       ...,
       [Decimal('0'), Decimal('5.444444444444444444444444443'),
        Decimal('16'), ..., Decimal('25'), Decimal('0.81'), Decimal('0.04')],
       [Decimal('1'), Decimal('7.111111111111111111111111113'),
        Decimal('1'), ..., Decimal('0'), Decimal('81'), Decimal('2.25')],
       [Decimal('1'), Decimal('2.25'), Decimal('4'), ..., Decimal('0.25'),
        Decimal('0.6944444444444444444444444444'), Decimal('1')]], dtype=object)
4789338L # 4.8 sec_<<<_Decimal_______________________________vs. 19 msec float96
2692088L # 2.7 sec_<<<_Fraction______________________________vs. 19 msec float96

May expectations for a Tri-diagonal matrix 1000x1000 go under 50 msec?

as there are 3,000 non-zero ( sparse-representation ) elements, as opposed to 250,000 cells in fully-populated 500x500 matrices tested above, there is a huge performance jump for using these pythonic classes for your arbitrary precision computations. Fraction has a more space once the engine may use an advantage of numerator/denominator construction for MUL / DIV operations over Decimal, however exact performance envelopes shall be tested in-vivo, on real cases your computing approach is using.

Sympy syntax for SparseMatrix & tests on 1000x1000 Tri-Diagonals

a real test on 1000x1000 SparseMatrix, as proposed by @tmyklebu, took a bit longer to get elaborated due to installaton troubles, however may give you some further insight into real-world implementation Projects:

>>> F = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Fraction()
>>> D = sympy.SparseMatrix( 1000, 1000, { (0,0): 1} )        # .Decimal()

>>> for i in range( 1000 ):                                  # GEN to have F & D hold
...     for j in range( 1000 ):                              #     SAME values,
...         if i-j in [-1,0,1]:                              # but DIFF representations
...            num = int( 100 * numpy.random.random() )      #     
...            den = int( 100 * numpy.random.random() ) + 1  # + 1 to avoid DIV!0
...            F[i,j] = fractions.Fraction( numerator = num, denominator = den )
...            D[i,j] = decimal.Decimal( str( num ) ) / decimal.Decimal( str( den ) )

# called in Zig-Zag(F*F/D*D/F*F/D*D/...) order to avoid memory-access cache artifacts

>>> aClk.start();VOID=F*F;aClk.stop()
770353L                                      # notice the 1st eval took  TRIPLE LONGER
205585L                                      # notice the 2nd+
205364L # 0.205 sec_<<<_Fraction()____________________________vs. 0.331 sec Decimal()


>>> aClk.start();VOID=D*D;aClk.stop()
383137L # 0.383 sec_<<<_Decimal()____________________________vs. 0.770 sec 1st Fraction()
390164L # 0.390 sec_<<<_Decimal()____________________________vs. 0.205 sec 2nd Fraction()
331291L # 0.331 sec_<<<_Decimal()____________________________vs. 0.205 sec 3rd Fraction()

>>> F[0:4,0:4]
Matrix([
[ 1/52,  6/23,     0,     0],
[42/29, 29/12,     1,     0],
[    0, 57/88, 39/62, 13/57],
[    0,     0, 34/83, 26/95]])
>>> D[0:4,0:4]
Matrix([
[0.0192307692307692, 0.260869565217391,                 0,                 0],
[  1.44827586206897,  2.41666666666667,               1.0,                 0],
[                 0, 0.647727272727273, 0.629032258064516, 0.228070175438596],
[                 0,                 0, 0.409638554216867, 0.273684210526316]])
查看更多
唯我独甜
3楼-- · 2019-02-19 08:49

I haven't used sympy for matrix operations, but I can repro the slowness you are experiencing with this code. It seems that the matrix operations in sympy are not that great.

I will recommend you to use numpy, which has great matrix operations and is very fast. Here is a copy of your code in numpy, which does the multiplication in under 1 sec on my laptop:

In [1]: import numpy as np

In [2]: A = np.matrix([[2 + abs(i-j) if i-j in [-1, 0, 1] else 0 for i in range(0, 500)] for j in range(0, 500)])

In [3]: A[0:7,0:7]
Out[3]:
matrix([[2, 3, 0, 0, 0, 0, 0],
        [3, 2, 3, 0, 0, 0, 0],
        [0, 3, 2, 3, 0, 0, 0],
        [0, 0, 3, 2, 3, 0, 0],
        [0, 0, 0, 3, 2, 3, 0],
        [0, 0, 0, 0, 3, 2, 3],
        [0, 0, 0, 0, 0, 3, 2]])

In [4]: A * A
Out[4]:
matrix([[13, 12,  9, ...,  0,  0,  0],
        [12, 22, 12, ...,  0,  0,  0],
        [ 9, 12, 22, ...,  0,  0,  0],
        ...,
        [ 0,  0,  0, ..., 22, 12,  9],
        [ 0,  0,  0, ..., 12, 22, 12],
        [ 0,  0,  0, ...,  9, 12, 13]])
查看更多
放我归山
4楼-- · 2019-02-19 08:53

Why not use sympy's sparse matrices instead of dense matrices? The matrices that arise when solving (linear) recurrences are typically sparse; one technique gives you a matrix with 1's on the first superdiagonal and zeroes everywhere else except the bottom row, where the recurrence coefficients go.

查看更多
登录 后发表回答