Google App Engine Search API

2019-01-26 12:45发布

问题:

When querying a search index in the Python version of the GAE Search API, what is the best practice for searching for items where documents with words match the title are first returned, and then documents where words match the body?

For example given:

body = """This is the body of the document, 
with a set of words"""

my_document = search.Document(
  fields=[
    search.TextField(name='title', value='A Set Of Words'),
    search.TextField(name='body', value=body),
   ])

If it is possible, how might one perform a search on an index of Documents of the above form with results returned in this priority, where the phrase being searched for is in the variable qs:

  1. Documents whose title matches the qs; then
  2. Documents whose body match the qs words.

It seems like the correct solution is to use a MatchScorer, but I may be off the mark on this as I have not used this search functionality before. It is not clear from the documentation how to use the MatchScorer, but I presume one subclasses it and overloads some function - but as this is not documented, and I have not delved into the code, I cannot say for sure.

Is there something here that I am missing, or is this the correct strategy? Did I miss where this sort of thing is documented?


Just for clarity here is a more elaborate example of the desired outcome:

documents = [
  dict(title="Alpha", body="A"),          # "Alpha"
  dict(title="Beta", body="B Two"),       # "Beta"
  dict(title="Alpha Two", body="A"),      # "Alpha2"
]

for doc in documents: 
  search.Document(
    fields=[
       search.TextField(name="title", value=doc.title),
       search.TextField(name="body", value=doc.body),
    ]
  )
  index.put(doc)  # for some search.Index

# Then when we search, we search the Title and Body.
index.search("Alpha")
# returns [Alpha, Alpha2]

# Results where the search is found in the Title are given higher weight.
index.search("Two")
# returns [Alpha2, Beta]  -- note Alpha2 has 'Two' in the title.

回答1:

Custom scoring is one of our top priority feature requests. We're hoping to have a good way to do this sort of thing as soon as possible.

In your particular case, you could of course achieve the desired result by doing two separate queries: the first one with field restriction on "title", and the second restricted on "body".