I have approximately 400,000 documents in a GAE Search index. All documents have a location
GeoPoint
property and are spread over the entire globe. Some documents might be over 4000km away from any other document, others might be bunched within meters of each other.
I would like to find the closest document to a specific set of coordinates but find the following code gives incorrect results:
from google.appengine.api import search
# coords are in the form of a tuple e.g. (50.123, 1.123)
search.Document(
doc_id='meaningful-unique-id',
fields=[search.GeoField(name='location'
value=search.GeoPoint(coords[0], coords[1]))])
# find document function radius is in metres
def find_document(coords, radius=1000000):
sort_expr = search.SortExpression(
expression='distance(location, geopoint(%.3f, %.3f))' % coords,
direction=search.SortExpression.ASCENDING,
default_value=0)
search_query = search.Query(
query_string='distance(location, geopoint(%.3f, %.3f)) < %d' \
% (coords[0], coords[1], radius),
options=search.QueryOptions(
limit=1,
ids_only=True,
sort_options=search.SortOptions(expressions=[sort_expr])))
index = search.Index(name='document-index')
return index.search(search_query)
With this code I will get results that are consistent but incorrect. For example, a search for the nearest document to London indicated the closest one was in Scotland. I have verified that there are thousands of closer documents.
I narrowed the problem down to the radius
parameter being too large. I get correct results if the radius is down to around 12km (radius=12000
). There are generally no more than 1000 documents in a 12 km radius. (Probably associated with search.SortOptions(limit=1000)
.)
The problem is that if I am in a sparse area of the globe where there aren't any documents for thousands of miles, my search function will not return anything with radius=12000
(12km). I want it to return the closest document to me wherever I am. How can I accomplish this consistently with one call to the Search API?