I am currently exceeding the soft memory limit when I try to do simple writes to the Google App Engine blobstore. What is the proper way to write this code so that it does not leak memory?
from __future__ import with_statement
from google.appengine.api import files
from google.appengine.api import blobstore
def files_test(limit):
file_name = files.blobstore.create(mime_type='application/octet-stream')
try:
with files.open(file_name, 'a') as f:
for x in range(limit):
f.write("Testing \n")
finally:
files.finalize(file_name)
return files.blobstore.get_blob_key(file_name)
files_test(4000) produces the error:
Exceeded soft private memory limit with 157.578 MB after servicing 27 requests total
Since it is Python 2.5,
xrange
should be better thanrange
, isn't it?You should write all data at once to avoid problem and speed up - optimize it - it should stop problem but not solve bug -> http://docs.python.org/library/stringio.html. Consider that file.write could be not simple write but request to RPC API which is slow to setup - see SDK code - avoid multi calls/buffer.
With such small amount of data 4000*9 it should not occur - it looks like bug in Google API - just report it http://code.google.com/p/googleappengine/issues/list?can=2&q=&sort=-id&colspec=ID%20Type%20Component%20Status%20Stars%20Summary%20Language%20Priority%20Owner%20Log
Consider that 'create' is marked as experimental http://code.google.com/intl/pl/appengine/docs/python/blobstore/overview.html#Writing_Files_to_the_Blobstore
Fix also finalize bugs - not finalize invalid file or return result if exception!
Maybe I'm wrong, but I am quite sure that the number of calls to write() could be the problem here. I think that is the problem because I had a similar problem when I tried to save a file that was uploaded by a user.
For me this was causing the problem:
For me the problem went away when i changed the chunk size to be 1Mb instead of 1Kb.
Unfortunately python's garbage collector is not perfect. Every write you do creates lots of small objects (via protocol buffer creation) that is not collected by python on the fly for some reason. I found that in mapreduce library I have to do
from time to time to keep garbage collector happy.