Django: Annotation on Subquery

2019-07-18 10:42发布

I'm trying to annotate a queryset of Stations with the id of the nearest neighbouring Station using Django 2.0.3 and PostGIS (GeoDjango) functions.

Simplified Station model:

class Station(models.Model):
    name = models.CharField(max_length=128)
    location = models.PointField()
    objects = StationQuerySet.as_manager()

The problem I'm having is trying to compute the closest distance, which involves annotating a subquery which refers to the location in the outer queryset.

from django.db.models import OuterRef, Subquery
from django.contrib.gis.db.models.functions import Distance

class StationQuerySet(models.QuerySet):

    def add_nearest_neighbour(self):
        '''
        Annotates each station with the id and distance of the nearest neighbouring station
        '''
        # Get Station model
        Station = self.model

        # Calculate distances to each station in subquery
        subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)

        # Get nearest from subquery
        nearest = subquery_with_distance.order_by('distance').values('id')[0]

        return self.annotate(
            nearest_station_id=Subquery(nearest)
        )

The line distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000) results in an error seen below:

from apps.bikeshare.models import Station
stations = Station.objects.add_nearest_neighbour()

Error:

Traceback (most recent call last):
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2847, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-cb35ea6d5d8b>", line 1, in <module>
    stations = Station.objects.add_nearest_neighbour()
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/gbrown/Development/transit_bikeshare/apps/bikeshare/querysets.py", line 162, in add_nearest_neighbour
    subquery_with_distance = Station.objects.annotate(distance=Distance('location', OuterRef('location')) / 1000)
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/query.py", line 997, in annotate
    clone.query.add_annotation(annotation, alias, is_summary=False)
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/sql/query.py", line 975, in add_annotation
    summarize=is_summary)
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 452, in resolve_expression
    c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save)
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/contrib/gis/db/models/functions.py", line 58, in resolve_expression
    source_fields = res.get_source_fields()
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in get_source_fields
    return [e._output_field_or_none for e in self.get_source_expressions()]
  File "/home/gbrown/Envs/bikeshare-dev/lib/python3.5/site-packages/django/db/models/expressions.py", line 349, in <listcomp>
    return [e._output_field_or_none for e in self.get_source_expressions()]
AttributeError: 'ResolvedOuterRef' object has no attribute '_output_field_or_none'

1条回答
成全新的幸福
2楼-- · 2019-07-18 11:35

Came up with a work around using a raw query to find the nearest station and select the id AND distance from the subquery, bonus explanation below:

class StationQuerySet(models.QuerySet):

    def nearest_neighbour(self):
        '''
        Creates a RawQuerySet of each station with the id and distance of the nearest neighbouring station
        '''
        # Have to execute the query in order to get the list of ids to inject
        ids = tuple(self.values('id').values_list('id', flat=True))

        return self.raw('''
               SELECT
                 A0.id   as id,
                 SUB.closest_id,
                 SUB.closest_distance
               FROM "bikeshare_station" A0
                 CROSS JOIN LATERAL (
                            SELECT
                              B0.id   as closest_id,
                              st_distance_sphere(A0.location, B0.location) as closest_distance
                            FROM "bikeshare_station" B0
                            WHERE A0.id != B0.id
                            ORDER BY A0.location <-> B0.location
                            limit 1
                            ) SUB
               WHERE A0.id IN %s;
           ''', [ids])

Usage

You can chain querysets calls together to filter down the queryset before finding the nearest neighbour:

query = Station.objects.filter(name='Albert Gate, Hyde Park')
closest_stations = query.nearest_neighbour()
station = closest_stations[0]
station.name
[out]: 'Albert Gate, Hyde Park'
station.closest_distance
[out]: 133.52459069
station.closest_id
[out]: 6369

SQL Explanation

This type of subquery is called a correlated subquery because it references columns in the outer query. In addition, I need to select multiple pieces of information about the nearest station (id, distance, etc.).

The subquery is placed in the FROM clause, which allows multiple columns to be selected. A LATERAL join is needed to allow the subquery to reference the sibling table in the FROM list. With the subquery returning a single row, a CROSS join can be applied to form a joined table based on the cartesian product rather than on a shared column.

The subquery uses the PostGIS <-> operator, which is much more efficient at ordering the table by distance between the stations, and st_distance_sphere, to do an accurate distance calculation between the points.

查看更多
登录 后发表回答