I'm trying to annotate a queryset of Station
s with the id
of the nearest neighbouring Station
using Django 2.0.3 and PostGIS (GeoDjango) functions.
Simplified Station
model:
class Station(models.Model):
name = models.CharField(max_length=128)
location = models.PointField()
objects = StationQuerySet.as_manager()
The problem I'm having is trying to compute the closest distance, which involves annotating a subquery which refers to the location
in the outer queryset.
from django.db.models import OuterRef, Subquery
from django.contrib.gis.db.models.functions import Distance
class StationQuerySet(models.QuerySet):
def add_nearest_neighbour(self):
'''
Annotates each station with the id and distance of the nearest neighbouring station
'''
# Get Station model
Station = self.model
# Calculate distances to each station in subquery
subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
# Get nearest from subquery
nearest = subquery_with_distance.order_by('distance').values('id')[0]
return self.annotate(
nearest_station_id=Subquery(nearest)
)
The line distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
results in an error seen below:
from apps.bikeshare.models import Station
stations = Station.objects.add_nearest_neighbour()
Error:
Traceback (most recent call last):
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-cb35ea6d5d8b>", line 1, in <module>
stations = Station.objects.add_nearest_neighbour()
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/gbrown/Development/transit_bikeshare/apps/bikeshare/querysets.py", line 162, in add_nearest_neighbour
subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/query.py", line 997, in annotate
clone.query.add_annotation(annotation, alias, is_summary=False)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/sql/query.py", line 975, in add_annotation
summarize=is_summary)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 452, in resolve_expression
c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save)
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/contrib/gis/db/models/functions.py", line 58, in resolve_expression
source_fields = res.get_source_fields()
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in get_source_fields
return [e._output_field_or_none for e in self.get_source_expressions()]
File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in <listcomp>
return [e._output_field_or_none for e in self.get_source_expressions()]
AttributeError: 'ResolvedOuterRef' object has no attribute '_output_field_or_none'
Came up with a work around using a raw query to find the nearest station and select the id AND distance from the subquery, bonus explanation below:
Usage
You can chain querysets calls together to filter down the queryset before finding the nearest neighbour:
SQL Explanation
This type of subquery is called a correlated subquery because it references columns in the outer query. In addition, I need to select multiple pieces of information about the nearest station (
id
,distance
, etc.).The subquery is placed in the
FROM
clause, which allows multiple columns to be selected. ALATERAL
join is needed to allow the subquery to reference the sibling table in theFROM
list. With the subquery returning a single row, aCROSS
join can be applied to form a joined table based on the cartesian product rather than on a shared column.The subquery uses the PostGIS
<->
operator, which is much more efficient at ordering the table by distance between the stations, andst_distance_sphere
, to do an accurate distance calculation between the points.