google app engine NDB records counts from NDB mode

2019-08-27 11:34发布

How many records we can get from google app engine from single query so that we can display count to user and is we can increase timeout limit 3 seconds to 5 seconds

1条回答
贪生不怕死
2楼-- · 2019-08-27 11:45

In my experience, ndb cannot pull more than 1000 records at a time. Here is an example of what happens if I try to use .count() on a table that contains ~500,000 records.

s~project-id> models.Transaction.query().count()
WARNING:root:suspended generator _count_async(query.py:1330) raised AssertionError()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/utils.py", line 160, in positional_wrapper
    return wrapped(*args, **kwds)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1287, in count
    return self.count_async(limit, **q_options).get_result()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
    self.check_success()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
    value = gen.throw(exc.__class__, exc, tb)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1330, in _count_async
    batch = yield rpc
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
    result = rpc.get_result()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 614, in get_result
    return self.__get_result_hook(self)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_query.py", line 2910, in __query_result_hook
    self._batch_shared.conn.check_rpc_success(rpc)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1377, in check_rpc_success
    rpc.check_success()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 580, in check_success
    self.__rpc.CheckSuccess()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
    self.request, self.response)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 308, in MakeSyncCall
    handler(request, response)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 362, in _Dynamic_Next
    assert next_request.offset() == 0
AssertionError

To by pass this, you can do something like:

objs = []
q = None
more = True
while more:
    _objs, q, more = models.Transaction.query().fetch_page(300, start_cursor=q)
    objs.extend(_objs)

But even that will eventually hit memory/timeout limits.

Currently I use Google Dataflow to pre-compute these values and store the results in Datastore as the models DaySummaries & StatsPerUser

EDIT:

snakecharmerb is correct. I was able to use .count() in the production environment, but the more entities it has to count, the longer it seems to take. Here's a screenshot of my logs viewer where it took ~15 seconds to count ~330,000 records

enter image description here

When I tried adding a filter to that query which returned a count of ~4500, it took about a second to run instead.

EDIT #2:

Ok I had another app engine project with a kind with ~8,000,000 records. I tried to do .count() on that in my http request handler and the request timed-out after running for 60 seconds.

enter image description here

查看更多
登录 后发表回答