Im trying to run Mort Canty's http://mcanty.homepage.t-online.de/ Python iMAD implementation on bitemporal RapidEye Multispectral images. Which basically calculates the canonical correlation for the two images and then substracts them. The problem I'm having is that the images are of 5000 x 5000 x 5 (bands) pixels. If I try to run this on the whole image I get a memory error.
Would the use of something like pyTables help me with this?
What Mort Canty's code tries to do is that it loads the images using gdal and then stores them in an 10 x 25,000,000 array.
# initial weights
wt = ones(cols*rows)
# data array (transposed so observations are columns)
dm = zeros((2*bands,cols*rows))
k = 0
for b in pos:
band1 = inDataset1.GetRasterBand(b+1)
band1 = band1.ReadAsArray(x0,y0,cols,rows).astype(float)
dm[k,:] = ravel(band1)
band2 = inDataset2.GetRasterBand(b+1)
band2 = band2.ReadAsArray(x0,y0,cols,rows).astype(float)
dm[bands+k,:] = ravel(band2)
k += 1
Even just creating a 10 x 25,000,000 numpy array of floats throws a memory error. Anyone have a good idea of how to get around this? This is my first post ever so any advice on how to post would also be welcome.
Greetings
numpy
usesfloat64
per default, so yourdm
-array takes up 2GB of memory (8*10*25000000), the other arrays probably about 200MB (~8*5000*5000) each.astype(float)
returns a new array, so you need memory for that as well - and is probably not even needed as the type is implicitly converted when copying the data to the result array.when the memory used in the for-loop is freed depends on garbage collection. and this doesn't consider the memory overhead of
GetRasterBand
,ReadAsArray
.are your sure your input data uses 64-bit floats? if it uses 32-bit floats, you could easyliy half the memory usage by specifying
dtype='f'
on your arrays.