Use of full-text search + GIN in a view (Django 1.

2019-07-17 04:39发布

问题:

I need some help with building proper query in a django view for full-text search using GIN index. I have quite a big database (~400k lines) and need to do a full-text search on 3 fields from it. Tried to use django docs search and this is code BEFORE GIN. It works, but takes 6+ seconds to search over all fields. Next I tried to implement a GIN index to speed up my search. There are a lot of questions already how to build it. But my question is - how does the view query change when using a GIN index for search? What fields should I search?

Before GIN:

models.py

class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True, )
    shop = models.ForeignKey('Shop', to_field='shop_name')
    brand = models.ForeignKey('Brand', to_field='brand_name')
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

views.py

def get_cosmetic(request):
if request.method == "GET":
    pass
else:
    search_words = request.POST.get('search')
    search_vectors = SearchVector('title', weight='B')+ SearchVector('description', weight='C') + SearchVector('brand__brand_name', weight='A')

    products = Product.objects.annotate(search = search_vectors, rank=SearchRank(search_vectors, search))\
        .filter(search=search_words ).order_by('-rank')

    return render(request, 'example.html', {"products": products})

After GIN:
models.py

class ProductManager(models.Manager):
def with_documents(self):
    vector = pg_search.SearchVector('brand__brand_name', weight='A') +\
            pg_search.SearchVector('title', weight='A')+\
            pg_search.SearchVector('description', weight='C')
    return self.get_queryset().annotate(document=vector)


class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True, )
    shop = models.ForeignKey('Shop', to_field='shop_name')
    brand = models.ForeignKey('Brand', to_field='brand_name')
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

search_vector = pg_search.SearchVectorField(null=True)

objects = ProductManager()

class Meta:
    indexes = [
        indexes.GinIndex(fields=['search_vector'], name='title_index')
    ]

#update search_vector every time the entry updates
def save(self, *args, **kwargs):
    super().save(*args, **kwargs)
    if 'update_fields' not in kwargs or 'search_vector' not in kwargs['update_fields']:
        instance = self._meta.default_manager.with_documents().get(pk=self.pk)
        instance.search_vector = instance.document
        instance.save(update_fields=['search_vector'])

views.py

def get_cosmetic(request):
if request.method == "GET":
    pass

else:
    search_words = request.POST.get('search')    
    products = ?????????
    return render(request, 'example.html', {"products": products})

回答1:

Answering my own question:

products = Product.objects.annotate(rank=SearchRank(F('search_vector'), search_words))
                          .filter(search_vector=search_words)
                          .order_by('-rank')


This means you should search your index field - in my case search_vector field.
Also I have changed my code a bit in ProductManager() class, so now I can just use

products = Product.objects.with_documents(search_words)

Where with_documents() is a custom function of custom ProductManager(). The recipe of this change is here (page 30).

What does all this code do:
1) creates search_vector with scores to fields, field with bigger score - gets higher place in result sorting.
2) creates GIN index for full-text search via ORM Django
3) updates GIN index every time the instance of model is changed

What this code dosn't do:
1) It doesn't sort by relevance of substring which is queried. Possible solution.

Hope this will help somebody with a bit complicated full-text search in Django.