Problem
Running a datastore query with or without FetchOptions.Builder.withLimit(100) takes the same execution time! Why is that? Isn't the limit method intended to reduce the time to retrieve results!?
Test setup
I am locally testing the execution time of some datastore queries with Google's App Engine. I am using the Google Cloud SDK Standard Environment with the App Engine SDK 1.9.59.
For the test, I created an example entity with 5 indexed properties and 5 unindexed properties. I filled the datastore with 50.000 entries of a test entity. I run the following method to retrieve 100 of this entities by utilizing the withLimit() method.
public List<Long> getTestIds() {
List<Long> ids = new ArrayList<>();
FetchOptions fetchOptions = FetchOptions.Builder.withLimit(100);
Query q = new Query("test_kind").setKeysOnly();
for (Entity entity : datastore.prepare(q).asIterable(fetchOptions)) {
ids.add(entity.getKey().getId());
}
return ids;
}
I measure the time before and after calling this method:
long start = System.currentTimeMillis();
int size = getTestIds().size();
long end = System.currentTimeMillis();
log.info("time: " + (end - start) + " results: " + size);
I log the execution time and the number of returned results.
Results
When I do not use the withLimit() FetchOptions for the query, I get the expected 50.000 results in about 1740 ms. Nothing surprising here.
If I run the code as displayed above and use withLimit(100) I get the expected 100 results. However, the query runs about the same 1740 ms!
I tested with different numbers of datastore entries and different limits. Every time the queries with or without withLimit(100) took the same time.
Question
Why is the query still fetching all entities? I am sure the query is not supposed to get all entities even though the limit is set to 100 right? What am I missing? Is there some datastore configuration for that? After testing and searching the web for 4 days I still can't find the problem.
FWIW, you shouldn't expect meaningful results from datastore performance tests performed locally, using either the development server or the datastore emulator - they're just emulators, they don't have the same performance (or even the 100% equivalent functionality) as the real datastore.
See for example Datastore fetch VS fetch(keys_only=True) then get_multi (including comments)