In Google App Engine, how to I reduce memory consu

2019-03-13 08:45发布

I'm using the blobstore to backup and recovery entities in csv format. The process is working well for all of my smaller models. However, once I start to work on models with more than 2K entities, I am exceeded the soft memory limit. I'm only fetching 50 entities at a time and then writing the results out to the blobstore, so I'm not clear why my memory usage would be building up. I can reliably make the method fail just by increasing the "limit" value passed in below which results in the method running just a little longer to export a few more entities.

  1. Any recommendations on how to optimize this process to reduce memory consumption?

  2. Also, the files produced will only <500KB in size. Why would the process use 140 MB of memory?

Simplified example:

file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
    writer = csv.DictWriter(f, fieldnames=properties)
    for entity in models.Player.all():
      row = backup.get_dict_for_entity(entity)
      writer.writerow(row)

Produces the error: Exceeded soft private memory limit with 150.957 MB after servicing 7 requests total

Simplified example 2:

The problem seems to be with using files and the with statement in python 2.5. Factoring out the csv stuff, I can reproduce almost the same error by simply trying to write a 4000 line text file to the blobstore.

from __future__ import with_statement
from google.appengine.api import files
from google.appengine.ext.blobstore import blobstore
file_name = files.blobstore.create(mime_type='application/octet-stream')   
myBuffer = StringIO.StringIO()

#Put 4000 lines of text in myBuffer

with files.open(file_name, 'a') as f:
    for line in myBuffer.getvalue().splitlies():
        f.write(line)

files.finalize(file_name)  
blob_key = files.blobstore.get_blob_key(file_name)

Produces the error: Exceeded soft private memory limit with 154.977 MB after servicing 24 requests total

Original:

def backup_model_to_blobstore(model, limit=None, batch_size=None):
    file_name = files.blobstore.create(mime_type='application/octet-stream')
    # Open the file and write to it
    with files.open(file_name, 'a') as f:
      #Get the fieldnames for the csv file.
      query = model.all().fetch(1)
      entity = query[0]
      properties = entity.__class__.properties()
      #Add ID as a property
      properties['ID'] = entity.key().id()

      #For debugging rather than try and catch
      if True:
        writer = csv.DictWriter(f, fieldnames=properties)
        #Write out a header row
        headers = dict( (n,n) for n in properties )
        writer.writerow(headers)

        numBatches = int(limit/batch_size)
        if numBatches == 0:
            numBatches = 1

        for x in range(numBatches):
          logging.info("************** querying with offset %s and limit %s", x*batch_size, batch_size)
          query = model.all().fetch(limit=batch_size, offset=x*batch_size)
          for entity in query:
            #This just returns a small dictionary with the key-value pairs
            row = get_dict_for_entity(entity)
            #write out a row for each entity.
            writer.writerow(row)

    # Finalize the file. Do this before attempting to read it.
    files.finalize(file_name)

    blob_key = files.blobstore.get_blob_key(file_name)
    return blob_key

The error looks like this in the logs

......
2012-02-02 21:59:19.063
************** querying with offset 2050 and limit 50
I 2012-02-02 21:59:20.076
************** querying with offset 2100 and limit 50
I 2012-02-02 21:59:20.781
************** querying with offset 2150 and limit 50
I 2012-02-02 21:59:21.508
Exception for: Chris (202.161.57.167)

err:
Traceback (most recent call last):
  .....
    blob_key = backup_model_to_blobstore(model, limit=limit, batch_size=batch_size)
  File "/base/data/home/apps/singpath/163.356548765202135434/singpath/backup.py", line 125, in backup_model_to_blobstore
    writer.writerow(row)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
    self.close()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
    raise FileNotOpenedError()
FileNotOpenedError

C 2012-02-02 21:59:23.009
Exceeded soft private memory limit with 149.426 MB after servicing 14 requests total

4条回答
Luminary・发光体
2楼-- · 2019-03-13 09:21

It can possibly be a Time Exceed error, due to the limitation of the request to 30 secs. In my implementation in order to bypass it instead of having a webapp handler for the operation I am firing an event in the default queue. The cool thing about the queue is that it takes one line of code to invoke it, it has a 10 min time limit and if a task fails it retries before the time limit. I am not really sure if it will solve your problem but it worths giving a try.

from google.appengine.api import taskqueue
...
taskqueue.add("the url that invokes your method")

you can find more info about the queues here.

Or consider using a backend for serious computations and file operations.

查看更多
Viruses.
3楼-- · 2019-03-13 09:32

You'd be better off not doing the batching yourself, but just iterating over the query. The iterator will pick a batch size (probably 20) that should be adequate:

q = model.all()
for entity in q:
    row = get_dict_for_entity(entity)
    writer.writerow(row)

This avoids re-running the query with ever-increasing offset, which is slow and causes quadratic behavior in the datastore.

An oft-overlooked fact about memory usage is that the in-memory representation of an entity can use 30-50 times the RAM compared to the serialized form of the entity; e.g. an entity that is 3KB on disk might use 100KB in RAM. (The exact blow-up factor depends on many factors; it's worse if you have lots of properties with long names and small values, even worse for repeated properties with long names.)

查看更多
Summer. ? 凉城
4楼-- · 2019-03-13 09:34

I can't speak for the memory use in Python, but considering your error message, the error most likely stems from the fact that a blobstore backed file in GAE can't be open for more than around 30 seconds so you have to close and reopen it periodically if your processing takes longer.

查看更多
做自己的国王
5楼-- · 2019-03-13 09:43

In What is the proper way to write to the Google App Engine blobstore as a file in Python 2.5 a similar problem was reported. In an answer there it is suggested that you should try inserting gc.collect() calls occasionally. Given what I know of the files API's implementation I think that is spot on. Give it a try!

查看更多
登录 后发表回答