Performance decreases with increasing nesting of a

2019-07-22 14:31发布

问题:

A short note: This question relates to another I asked previously, but since asking multiple questions within a single Q&A is concidered bad SO-style I splitted it up.

Setup

I have the following two implementations of a matrix-calculation:

The first implementation uses a matrix of shape (n, m) and the calculation is repeated in a for-loop for repetition-times:

import numpy as np


def foo():
    for i in range(1, n):
        for j in range(1, m):

            _deleteA = (
                        matrix[i, j] +
                        #some constants added here
            )
            _deleteB = (
                        matrix[i, j-1] +
                        #some constants added here
            )
            matrix[i, j] = min(_deleteA, _deleteB)

    return matrix

repetition = 3
for x in range(repetition):
    foo()

2. The second implementation avoids the extra for-loop and, hence, includes repetition = 3 into the matrix, which is then of shape (repetition, n, m):

def foo():
    for i in range(1, n):
        for j in range(1, m):

            _deleteA = (
                        matrix[:, i, j] +
                        #some constants added here
            )
            _deleteB = (
                        matrix[:, i, j-1] +
                        #some constants added here
            )
            matrix[:, i, j] = np.amin(np.stack((_deleteA, _deleteB), axis=1), axis=1)

    return matrix

Question

Regarding both implementations, I discovered this regarding their performance with %timeit in iPython:

The first implementation is faster (in my test-case with n=1000, m=1000: 17sec vs. 26sec). Why is numpy such slower when working on three instead of two dimensions?