I need to add a scalar to all elements of a huge matrix. The matrix will be as big as possible. In the example I will use a size of 2 GiB but in my real computation it will be much larger.
A = rand(2^14, 2^14)
If I execute
A += 1.0
Julia allocates an additional 2 GiB of memory. The operation takes about 1s. I could use a for
loop:
for jj = 1:size(A, 2), ii = 1:size(A, 1)
A[ii, jj] = A[ii, jj] + 1.0
end
This does not allocate any memory, but it takes one minute. Both approaches are not viable for me, because the first one violates memory constraints and the second is clearly inefficient. For element-wise multiplication there is scal!
, which uses BLAS. Is there any way of performing addition as effciently as multiplication using scal!
?
@DSM's answer is a good one. There are a number of things going on here that I'd like to address in addition, however. The reason your for loop is slow is because
A
is a non-constant global variable and your code is directly mutating that global. SinceA
is non-constant, the code has to guard against the possibility ofA
becoming a different value with a different type at any point during the execution of the loop. The code has to look up the type and location ofA
on every iteration of the loop and dynamically dispatch the method calls in the expressionA[ii, jj] = A[ii, jj] + 1.0
– that's a call togetindex
,+
andsetindex!
, all of which depend on the statically unknown type ofA
. You can immediately get much better performance just by doing this work in a function:Avoiding non-constant globals like this is the first recommendation in the Performance Tips section of the manual. You'll probably want to peruse the rest of this chapter as well.
We can further improve the performance of the
inc!
function using the@inbounds
annotation to indicate that bounds checks aren't necessary for this code, and by using linear indexing instead of two-dimensional indexing:Most of the speedup is from the
@inbounds
annotation rather than the linear indexing, although that does give a little speed boost. The@inbounds
annotation should be used sparingly, however, and only where one is both certain that the indexing cannot be out of bounds and performance is of the utmost importance. As you can see, the additional performance improvement while existent, is not overwhelming. Most of the benefit comes from not directly mutating global variables.You could do an in-place broadcast operation:
which only uses a total of ~2G of memory.