Datastore fetch VS fetch(keys_only=True) then get_

2019-02-25 23:16发布

I am fetching multiple entities 100+ from datastore using the below Query

return entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch()

Which was taking a long time (order of a few seconds) to load.

Trying to find an optimum way, I created exactly 100 entities, found that it takes anywhere between 750ms ~ 1000ms to fetch the 100 entities on local server, which is a lot of course. I am not sure how to get around a single line fetch to make it more efficient!

In a desperate attempt to optimize, I tried

  1. Removing the order part, still got the same results
  2. Removing the filter part, still got the same results
  3. Removing the order & filter part, still got the same results

So apparently it is something else. In a desperate attempt, I tried fetching for keys only then passing the keys to ndb.get_multi() function:

qKeys = entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch(keys_only=True)

return ndb.get_multi(qKeys)

To my surprise I get a better throughput! query results now loads in 450 ~ 550ms which is around ~40% better performance on average!

I am not sure why this happens, I would have thought that the fetch function already queries entities in the most optimum time.

Question: Any idea how I can optimize the single query line to load faster?

Side Question: Anyone knows what's the underlying mechanism for the fetch function, and why fetching keys only, then using ndb.get_multi() is faster?

1条回答
一纸荒年 Trace。
2楼-- · 2019-02-25 23:53

FWIW, you shouldn't expect meaningful results from datastore performance tests performed locally, using either the development server or the datastore emulator - they're just emulators, they don't have the same performance (or even the 100% equivalent functionality) as the real datastore.

Credit goes to @snakecharmerb, who correctly identified the culprit, confirmed by OP:

Be aware that performance characteristics in the cloud may differ from those on your local machine. You really want to be running these tests in the cloud. – snakecharmerb yesterday

@snakecharmerb you were right on your suggestion! Just tested on the cloud it's actually the other way around on the cloud in terms of performance. fetch() ~550ms, fetch(keysonly) then get_multi was ~700ms seems that fetch() works better on the cloud! – Khaled yesterday

查看更多
登录 后发表回答