Reducing numpy memory footprint in long-running ap

2019-03-27 15:30发布

问题:

In my application one hundred numpy arrays (1000 complex elements each) are generated and filled with data. Then over many iterations, the array elements are modified over and over again. After the initial generation, the system monitor reports around 50 Mb of RAM usage. Although I am not generating any new arrays, the footprint keeps growing by around 40 Mb per iteration.

I learned here, that the garbage collector does not handle numpy arrays. So I assume that some temporary arrays I am generating to manipulate data are not collected correctly.

Here it says that guppy.hpy().heap() does not help with profiling numpy, unfortunately.

How can I identify the source of the problem and ideally keep consumption constant over any number of iterations?

I suspect that I may be generating copies when assigning array elements as described here, which are then not garbage collected.

Can I manually dispose of temporary numpy arrays to assist garbage collection?

[Update 1]: Sample code

This bit of code is called thousands of times. Each time, the footprint increases. I cannot see why, because to my understanding it is only reading existing arrays and manipulating other existing arrays. Are any of these slicing operations doing something unintended? (Sorry for the line length. I can simplify it, but then I might be hiding my errors, too.)

for ts in np.arange(numTimeslots):
            for fc in np.arange(numFreqChunks):
                interfencep = np.sum( np.dot(np.dot(self.baseStations[bs].cells[cell].CSI_OFDMA[:,:,fc,ts] ,np.diag(cell.OFDMA_power[:,fc,ts])),self.baseStations[bs].cells[cell].CSI_OFDMA[:,:,fc,ts].conj().T) for bs in self.baseStations for cell in bs.cells if cell != self._cell) 
                noisep = np.eye(self.antennas) * (self.noisePower / numFreqChunks)
                self.OFDMA_interferenceCovar[:,:,fc,ts] = noisep + interfencep
                self.OFDMA_EC[:,:,fc,ts] = (np.dot(np.dot(self.OFDMA_CSI[:,:,fc,ts],linalg.inv(noisep+interfencep)),self.OFDMA_CSI[:,:,fc,ts].conj().T))
                eigs = linalg.eig(self.OFDMA_EC[:,:,fc,ts])[0]
                self.OFDMA_SINR[:,fc,ts] = np.real(eigs)

[Update 2]: For those curious, this is part of a mobile network simulator. Running on virtualenv, Python 2.7.3, Numpy 1.6.2, SciPy 0.11.0b1

[Update 3]: Via commenting it and checking the system monitor, I can identify the 'interferencep = ...'-line as a culprit. It allocated significant memory which is not freed. But why?

回答1:

I had the same kind of issue. Unfortunately I found no workaround. The only thing that worked for me was to refactor the code into small isolated function. Those functions should be made so that I can convince myself I keep no references to arrays that keep the garbage collector from collecting arrays. You should take special care to array views which are generated by slicing, etc ...

In order to make code more memory efficient and less prone to memory leaks, I often found usefull to use the out= keyword argument provided by many numpy functions.



回答2:

Via using the system monitor and code inspection/commenting, I found the memory leak. It was caused by comparing a numpy array with an empty list in a different file. I will dig into the leak in a different place and vote to delete this question as I find it too specific to assist anyone else.

[Update 1]: New question describing the source of the problem: Why does comparison of a numpy array with a list consume so much memory?