I have some board
numpy arrays like that:
array([[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0]])
And I'm using the following code to find the sum of elements on each nth diagonal from -7 to 8 of the board (and the mirrored version of it).
n = 8
rate = [b.diagonal(i).sum()
for b in (board, board[::-1])
for i in range(-n+1, n)]
After some profiling, this operation is taking about 2/3 of overall running time and it seems to be because of 2 factors:
- The
.diagonal
method builds a new array instead of a view (looks like numpy 1.7 will have a new.diag
method to solve that) - The iteration is done in python inside the list comprehension
So, there are any methods to find these sums faster (possibly in the C layer of numpy)?
After some more tests, I could reduce 7.5x the total time by caching this operation... Maybe I was looking for the wrong bottleneck?
One more thing:
Just found the .trace
method that replaces the diagonal(i).sum()
thing and... There wasn't much improvement in performance (about 2 to 4%).
So the problem should be the comprehension. Any ideas?
There's a possible solution using
stride_tricks
. This is based in part on the plethora of information available in the answers to this question, but the problem is just different enough, I think, not to count as a duplicate. Here's the basic idea, applied to a square matrix -- see below for a function implementing the more general solution.Of course, I have no idea how fast this will actually be. But I bet it will be faster than a Python list comprehension.
For convenience, here's a fully general
diagonals
function. It assumes you want to move the diagonal along the longest axis:As I posted in a comment, I wouldn't go into C code.
Try to go with PyPy. Actually it's numpy support is quiet good (however it not support directly array.diagonal) - I didn't check if there is other buidin method for that. Nerveless, I tried the following code:
The results are:
diag_sum
diag_sum
b.diagonal