I need to take the matrix product of two NumPy matrices (or other 2d arrays) containing log probabilities. The naive way np.log(np.dot(np.exp(a), np.exp(b)))
is not preferred for obvious reasons.
Using
from scipy.misc import logsumexp
res = np.zeros((a.shape[0], b.shape[1]))
for n in range(b.shape[1]):
# broadcast b[:,n] over rows of a, sum columns
res[:, n] = logsumexp(a + b[:, n].T, axis=1)
works but runs about 100 times slower than np.log(np.dot(np.exp(a), np.exp(b)))
Using
logsumexp((tile(a, (b.shape[1],1)) + repeat(b.T, a.shape[0], axis=0)).reshape(b.shape[1],a.shape[0],a.shape[1]), 2).T
or other combinations of tile and reshape also work but run even slower than the loop above due to the prohibitively large amounts of memory required for realistically sized input matrices.
I am currently considering writing a NumPy extension in C to compute this, but of course I'd rather avoid that. Is there an established way to do this, or does anybody know of a less memory intensive way of performing this computation?
EDIT: Thanks to larsmans for this solution (see below for derivation):
def logdot(a, b):
max_a, max_b = np.max(a), np.max(b)
exp_a, exp_b = a - max_a, b - max_b
np.exp(exp_a, out=exp_a)
np.exp(exp_b, out=exp_b)
c = np.dot(exp_a, exp_b)
np.log(c, out=c)
c += max_a + max_b
return c
A quick comparison of this method to the method posted above (logdot_old
) using iPython's magic %timeit
function yields the following:
In [1] a = np.log(np.random.rand(1000,2000))
In [2] b = np.log(np.random.rand(2000,1500))
In [3] x = logdot(a, b)
In [4] y = logdot_old(a, b) # this takes a while
In [5] np.any(np.abs(x-y) > 1e-14)
Out [5] False
In [6] %timeit logdot_old(a, b)
1 loops, best of 3: 1min 18s per loop
In [6] %timeit logdot(a, b)
1 loops, best of 3: 264 ms per loop
Obviously larsmans' method obliterates mine!
Suppose
A.shape==(n,r)
andB.shape==(r,m)
. In computing the matrix productC=A*B
, there are actuallyn*m
summations. To have stable results when you're working in log-space, You need the logsumexp trick in each of these summations. Fortunately, using numpy broadcasting that's quite easy to control stability of rows and columns of A and B separately.Here is the code:
Note:
The reasoning behind this is similar to the FredFoo's answer, but he used a single maximum value for each matrix. Since he did not consider every
n*m
summations, some elements of the final matrix might still be unstable as mentioned in one of the comments.Comparing with the currently accepted answer using @identity-m counter example:
which prints
You are accessing columns of
res
andb
, which has poor locality of reference. One thing to try is to store these in column-major order.logsumexp
works by evaluating the right-hand side of the equationI.e., it pulls out the max before starting to sum, to prevent overflow in
exp
. The same can be applied before doing vector dot products:but by taking a different turn in the derivation, we obtain
The final form has a vector dot product in its innards. It also extends readily to matrix multiplication, so we get the algorithm
This creates two
A
-sized temporaries and twoB
-sized ones, but one of each can be eliminated byand similarly for
B
. (If the input matrices may be modified by the function, all the temporaries can be eliminated.)The currently accepted answer by Fred Foo, as well as Hassan's answer, are numerically unstable (Hassan's answer is better). An example of an input on which Hassan's answer fails will be provided later. My implementation is as follows:
Just like Hassan's answer and Fred Foo's answer, my answer has time complexity O(ϴRI). Their answers have space complexity O(ϴR+RI) (I am not actually sure about this), while mine unfortunately has space complexity O(ϴRI) - this is because numpy can multiply a ϴ×R matrix by a R×I matrix without allocating an additional array of size ϴ×R×I. Having O(ϴRI) space complexity is not an immanent property of my method - I think if you write it out using cycles, you can avoid this space complexity, but unfortunately I don't think you can do that using stock numpy functions.
I have checked how much actual time my code runs, it's 20 times slower than regular matrix multiplication.
Here's how you can know that my answer is numerically stable:
logsumexp
function is known to be numerically stable.logmatmulexp
function is numerically stable.My implementation has another nice property. If instead of using numpy you write the same code in pytorch or using another library with automatic differentiation, you will get a numerically stable backward pass automatically. Here's how we can know the backward pass will be numerically stable:
np.max
)Below is the same code in pytorch (in case you need backpropagation). Due to how pytorch backpropagation works, during forward pass it will save the
log_pairwise_products
tensor for the backward pass. This tensor is large, and you probably don't want it to be saved - you can just recalculate it once again during backward pass. In such case I suggest you use checkpointing - it's really easy - see the second function below.Here's an input on which Hassan's implementation fails but my implementation gives the correct output: