Django Postgres ArrayField aggregation and filteri

2019-09-07 14:53发布

Following on from this question: Django Postgresql ArrayField aggregation

I have an ArrayField of Categories and I would like to retrieve all unique values it has - however the results should be filtered so that only values starting with the supplied string are returned.

What's the "most Django" way of doing this?

Given an Animal model that looks like this:

class Animal(models.Model):
    # ...
    categories = ArrayField(
        models.CharField(max_length=255, blank=True),
        default=list,
    )
    # ...

Then, as per the other question's answer, this works for finding all categories, unfiltered.

all_categories = (
    Animal.objects
    .annotate(categories_element=Func(F('categories'), function='unnest'))
    .values_list('categories_element', flat=True)
    .distinct()
)

However, now, when I attempt to filter the result I get failure, not just with __startswith but all types of filter:

all_categories.filter(categories_element__startswith('ga'))
all_categories.filter(categories_element='dog')

Bottom of stacktrace is:

DataError: malformed array literal: "dog"
...
DETAIL:  Array value must start with "{" or dimension information.

... and it appears that it's because Django tries to do a second UNNEST - this is the SQL it generates:

...) WHERE unnest("animal"."categories") = dog::text[]

If I write the query in PSQL then it appears to require a subquery as a result of the UNNEST:

SELECT categories_element
FROM (
    SELECT UNNEST(animal.categories) as categories_element
) ul
WHERE ul.categories_element like 'Ga%';

Is there a way to get Django ORM to make a working query? Or should I just give up on the ORM and use raw SQL?

1条回答
唯我独甜
2楼-- · 2019-09-07 15:51

You probably have the wrong database design.

Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

http://www.postgresql.org/docs/9.1/static/arrays.html

查看更多
登录 后发表回答