I needed to find the sum of all pairs of numbers in a long list of pairs. Lots of ways to do this in Mathematica, but I was thinking of using either Plus
or Total
. Since Total
works on lists, Map
is the functional programming instrument to use there and Apply
at level 1 (@@@) is the one to use for Plus
, as Plus
takes the numbers to be added as arguments.
Here is some demo code (warning: save all your work before executing this!):
pairs = Tuples[Range[6000], {2}]; (* toy example *)
TimeConstrained[Plus @@@ pairs; // Timing, 30]
(* Out[4]= {21.73, Null} *)
Total /@ pairs; // Timing
(* Out[5]= {3.525, Null} *)
You might have noticed that I've added TimeConstrained
to the code for Plus
. This is a protective measure I included for you because the bare code brought my PC almost to its knees. In fact, the above code works for me, but if I increase the range in the first line to 7000 my computer just locks up and never gets back. Nothing works, no alt-period, program switching, ctrl-alt-delete, attempts to fire up the process manager using the taskbar, closing the laptop lid to let it sleep, etc., really nothing.
The problem is caused by the extreme memory use of the Plus @@@ pairs
line. While 'pairs' itself takes up about 288 MB, and the list of totals half of that, the Plus line quickly consumes about 7 GB for its calculations. This is the end of my free physical memory and anything bigger causes the use of virtual memory on disk. And Mathematica and/or Windows apparently don't play nice when virtual memory is used (BTW, do MacOS and Linux behave better?). In contrast, the Total line doesn't have a noticeable impact on the memory usage graph.
I have two questions:
- Given the equivalence between
Plus
and Total
as stated in the documentation ("Total[list] is equivalent to Apply[Plus,list]." ) how to explain the extreme difference in behavior? I assume this has to do with the differences between Apply
and Map
, but I'm curious as to the internal mechanisms involved.
- I know I can restrict the memory footprint of a command by using
MemoryConstrained
, but it is a pain to have to use this everywhere where you suspect Mathematica might usurp all of your system resources. Is there a global setting that I can use to tell Mathematica to use physical memory only (or, preferably, a certain fraction thereof) for all of its operations? This would be extremely helpful as this behavior has caused a handful of lockups the last couple of weeks and it's really starting to annoy me.
I just want to add a couple of observations that may clarify the situation a bit more. As noted in the answer by @Joshua (see also the comments to this post for a similar discussion), the reason for inefficiency is related to unpacking. My guess is that the general reason why Apply
unpacks is that the compiler (Compile
) has a very limited support for Apply
- namely, only 3 heads can be used - List
, Plus
and Times
. For this reason, in the SystemOptions["CompileOptions"]
, we can see that the compile length for Apply
is set to infinity - it just does not make sense in general to even attempt auto-compiling Apply
. And then probably, when the compilation length is larger than the real array dimension, it unpacks. When we set the "ApplyCompileLength"
to a finite length, the behavior does change:
On["Packing"]
pairs=Tuples[Range[2000],{2}];
SetSystemOptions["CompileOptions"->"ApplyCompileLength"->100];
TimeConstrained[Plus@@@pairs;//Timing,30]
{0.594,Null}
Changing it back again restores the observed initial behavior:
In[34]:=
SetSystemOptions["CompileOptions" -> "ApplyCompileLength" -> Infinity];
TimeConstrained[Plus @@@ pairs; // Timing, 30]
During evaluation of In[34]:= Developer`FromPackedArray::punpack1: Unpacking
array with dimensions {4000000,2}. >>
Out[35]= {2.094, Null}
Regarding your second question: perhaps, the systematic way to constrain the memory is along the lines of what @Alexey Popkov did, by using the master kernel to control the slave kernel that is restarted once the memory is low. I can offer a hack that is far less sophisticated but may still be of some use. The following function
ClearAll[totalMemoryConstrained];
SetAttributes[totalMemoryConstrained, HoldRest];
Module[{memException},
totalMemoryConstrained[max_, body_, failexpr_] :=
Catch[MemoryConstrained[body,
Evaluate[
If[# < 0, Throw[failexpr, memException], #] &@(max -
MemoryInUse[])], failexpr], memException]];
will attempt to constrain the total memory used by the kernel, not just in a given particular computation. So, you can try wrapping it around your top-level function call, just once. Since it relies on MemoryConstrained
and MemoryInUse
, it is only as good as they are. More details on how it can be used, can be found in this Mathgroup post. You can use $Pre
to automate the application of this to your input, and reduce the amount of boilerplate code.
Plus@@@pairs
is unpacking:
In[11]:= On["Packing"]
In[12]:= pairs=Tuples[Range[6000],{2}];
In[13]:= TimeConstrained[Plus@@@pairs;//Timing,30]
During evaluation of In[13]:= Developer`FromPackedArray::punpack1: Unpacking array with dimensions {36000000,2}. >>
Out[13]= $Aborted
This will do the same thing and doesn't unpack, meaning it uses much less memory.
On["Packing"]
pairs=Tuples[Range[6000],{2}];
a = pairs[[All, 1]];b=pairs[[All, 2]];
Plus[a, b];
You can read more about packing in Mathematica here:
http://www.wolfram.com/technology/guide/PackedArrays/
The second part of the question is really actual for Mathematica users. I already asked related question in the official newsgroup and got the following answer from John Fultz:
On Thu, 10 Mar 2011 06:12:04 -0500
(EST), Alexey Popkov wrote:
Instead of MemoryConstrained I would prefer to have
'FreeMemoryConstrained'
function to protect from swapping securely...
That's just not how modern operating
systems work. All memory is virtual
memory. Whether it's backed by RAM,
disk, or some other storage medium is
a detail that the operating system
manages, not applications (with the
exception of mechanisms like
memory-mapped files). And if an
application did have the ability to
lock its memory into RAM, it would be
quite unfriendly indeed to other
applications on the system.
Would you really want an app that
insisted on keeping 2 gigabytes of RAM
in play for itself (or ten
applications that could keep 200
megabytes each), even if the
application didn't happen to be doing
any computation right now and other
apps were totally starved for RAM?
This could lead to a total failure of
the operating system itself, which is
much worse than swapping.
Modern operating systems simply cannot
allow apps to behave in that fashion.
If they did, then instead of swap
hell, you would end up with routine
failures of the entire operating
system itself.
Sincerely,
John Fultz
Despite this, I have implemented myself a function which checks the amount of free physical memory about 100 times per second and with a decrease of its volume below some user-defined threshold restarts slave kernel and executes in a new slave MathKernel process user-defined commands.
This function relies on NETLink and currently is implemented only for 32 bit Windows systems. It is not very expensive and does not take considerable additional processor time since it gets memory-related information by a call to GlobalMemoryStatusEx
function of kernel32.dll which is pretty fast.