Numpy memory error creating huge matrix

2019-01-09 08:12发布

问题:

I am using numpy and trying to create a huge matrix. While doing this, I receive a memory error

Because the matrix is not important, I will just show the way how to easily reproduce the error.

a = 10000000000
data = np.array([float('nan')] * a)

not surprisingly, this throws me MemoryError

There are two things I would like to tell:

  1. I really need to create and to use a big matrix
  2. I think I have enough RAM to handle this matrix (I have 24 Gb or RAM)

Is there an easy way to handle big matrices in numpy?

Just to be on the safe side, I previously read these posts (which sounds similar):

Very large matrices using Python and NumPy

Python/Numpy MemoryError

Processing a very very big data set in python - memory error

P.S. apparently I have some problems with multiplication and division of numbers, which made me think that I have enough memory. So I think it is time for me to go to sleep, review math and may be to buy some memory.

May be during this time some genius might come up with idea how to actually create this matrix using only 24 Gb of Ram.

Why I need this big matrix I am not going to do any manipulations with this matrix. All I need to do with it is to save it into pytables.

回答1:

Assuming each floating point number is 4 bytes each, you'd have

(10000000000 * 4) /(2**30.0) = 37.25290298461914

Or 37.5 gigabytes you need to store in memory. So I don't think 24gb of RAM is enough.



回答2:

If you can't afford creating such a matrix, but still wish to do some computations, try sparse matrices.

If you wish to pass it to another Python package that uses duck typing, you may create your own class with __getitem__ implementing dummy access.



回答3:

If you use pycharm editor for python you can change memory settings from

C:\Program Files\JetBrains\PyCharm 2018.2.4\bin\pycharm64.exe.vmoptions

you can decrease pycharm speed from this file so your program memory will allocate more megabites you must edit this codes

-Xms1024m
-Xmx2048m
-XX:ReservedCodeCacheSize=960m

so you can make them -Xms512m -Xmx1024m and finally your program will work but it'll affect the debugging performance in pycharm.