Memory use of Apply vs Map. Virtual memory use and

2020-02-25 23:36发布

问题:

I needed to find the sum of all pairs of numbers in a long list of pairs. Lots of ways to do this in Mathematica, but I was thinking of using either Plus or Total. Since Total works on lists, Map is the functional programming instrument to use there and Apply at level 1 (@@@) is the one to use for Plus, as Plus takes the numbers to be added as arguments.

Here is some demo code (warning: save all your work before executing this!):

pairs = Tuples[Range[6000], {2}]; (* toy example *)

TimeConstrained[Plus @@@ pairs; // Timing, 30]

(* Out[4]= {21.73, Null} *)

Total /@ pairs; // Timing

(* Out[5]= {3.525, Null} *)

You might have noticed that I've added TimeConstrained to the code for Plus. This is a protective measure I included for you because the bare code brought my PC almost to its knees. In fact, the above code works for me, but if I increase the range in the first line to 7000 my computer just locks up and never gets back. Nothing works, no alt-period, program switching, ctrl-alt-delete, attempts to fire up the process manager using the taskbar, closing the laptop lid to let it sleep, etc., really nothing.

The problem is caused by the extreme memory use of the Plus @@@ pairs line. While 'pairs' itself takes up about 288 MB, and the list of totals half of that, the Plus line quickly consumes about 7 GB for its calculations. This is the end of my free physical memory and anything bigger causes the use of virtual memory on disk. And Mathematica and/or Windows apparently don't play nice when virtual memory is used (BTW, do MacOS and Linux behave better?). In contrast, the Total line doesn't have a noticeable impact on the memory usage graph.

I have two questions:

  1. Given the equivalence between Plus and Total as stated in the documentation ("Total[list] is equivalent to Apply[Plus,list]." ) how to explain the extreme difference in behavior? I assume this has to do with the differences between Apply and Map, but I'm curious as to the internal mechanisms involved.
  2. I know I can restrict the memory footprint of a command by using MemoryConstrained, but it is a pain to have to use this everywhere where you suspect Mathematica might usurp all of your system resources. Is there a global setting that I can use to tell Mathematica to use physical memory only (or, preferably, a certain fraction thereof) for all of its operations? This would be extremely helpful as this behavior has caused a handful of lockups the last couple of weeks and it's really starting to annoy me.

回答1:

I just want to add a couple of observations that may clarify the situation a bit more. As noted in the answer by @Joshua (see also the comments to this post for a similar discussion), the reason for inefficiency is related to unpacking. My guess is that the general reason why Apply unpacks is that the compiler (Compile) has a very limited support for Apply - namely, only 3 heads can be used - List, Plus and Times. For this reason, in the SystemOptions["CompileOptions"], we can see that the compile length for Apply is set to infinity - it just does not make sense in general to even attempt auto-compiling Apply. And then probably, when the compilation length is larger than the real array dimension, it unpacks. When we set the "ApplyCompileLength" to a finite length, the behavior does change:

On["Packing"]
pairs=Tuples[Range[2000],{2}];
SetSystemOptions["CompileOptions"->"ApplyCompileLength"->100];
TimeConstrained[Plus@@@pairs;//Timing,30]

{0.594,Null}

Changing it back again restores the observed initial behavior:

In[34]:= 
SetSystemOptions["CompileOptions" -> "ApplyCompileLength" -> Infinity];
TimeConstrained[Plus @@@ pairs; // Timing, 30]

During evaluation of In[34]:= Developer`FromPackedArray::punpack1: Unpacking 
array with dimensions  {4000000,2}. >>

Out[35]= {2.094, Null}

Regarding your second question: perhaps, the systematic way to constrain the memory is along the lines of what @Alexey Popkov did, by using the master kernel to control the slave kernel that is restarted once the memory is low. I can offer a hack that is far less sophisticated but may still be of some use. The following function

ClearAll[totalMemoryConstrained];
SetAttributes[totalMemoryConstrained, HoldRest];
Module[{memException},
  totalMemoryConstrained[max_, body_, failexpr_] :=
   Catch[MemoryConstrained[body,
     Evaluate[
       If[# < 0, Throw[failexpr, memException], #] &@(max -
         MemoryInUse[])], failexpr], memException]]; 

will attempt to constrain the total memory used by the kernel, not just in a given particular computation. So, you can try wrapping it around your top-level function call, just once. Since it relies on MemoryConstrained and MemoryInUse, it is only as good as they are. More details on how it can be used, can be found in this Mathgroup post. You can use $Pre to automate the application of this to your input, and reduce the amount of boilerplate code.



回答2:

Plus@@@pairs is unpacking:

In[11]:= On["Packing"]
In[12]:= pairs=Tuples[Range[6000],{2}];
In[13]:= TimeConstrained[Plus@@@pairs;//Timing,30]
During evaluation of In[13]:= Developer`FromPackedArray::punpack1: Unpacking array with dimensions {36000000,2}. >>
Out[13]= $Aborted

This will do the same thing and doesn't unpack, meaning it uses much less memory.

On["Packing"]
pairs=Tuples[Range[6000],{2}];
a = pairs[[All, 1]];b=pairs[[All, 2]];
Plus[a, b];

You can read more about packing in Mathematica here: http://www.wolfram.com/technology/guide/PackedArrays/



回答3:

The second part of the question is really actual for Mathematica users. I already asked related question in the official newsgroup and got the following answer from John Fultz:

On Thu, 10 Mar 2011 06:12:04 -0500 (EST), Alexey Popkov wrote:

Instead of MemoryConstrained I would prefer to have 'FreeMemoryConstrained' function to protect from swapping securely...

That's just not how modern operating systems work. All memory is virtual memory. Whether it's backed by RAM, disk, or some other storage medium is a detail that the operating system manages, not applications (with the exception of mechanisms like memory-mapped files). And if an application did have the ability to lock its memory into RAM, it would be quite unfriendly indeed to other applications on the system.

Would you really want an app that insisted on keeping 2 gigabytes of RAM in play for itself (or ten applications that could keep 200 megabytes each), even if the application didn't happen to be doing any computation right now and other apps were totally starved for RAM? This could lead to a total failure of the operating system itself, which is much worse than swapping.

Modern operating systems simply cannot allow apps to behave in that fashion. If they did, then instead of swap hell, you would end up with routine failures of the entire operating system itself.

Sincerely,

John Fultz

Despite this, I have implemented myself a function which checks the amount of free physical memory about 100 times per second and with a decrease of its volume below some user-defined threshold restarts slave kernel and executes in a new slave MathKernel process user-defined commands.

This function relies on NETLink and currently is implemented only for 32 bit Windows systems. It is not very expensive and does not take considerable additional processor time since it gets memory-related information by a call to GlobalMemoryStatusEx function of kernel32.dll which is pretty fast.