boost results by integer field

2019-08-23 08:38发布

问题:

I'm trying to create and autocomplete of destinations and i want to boost results by a popularity integer field.

i'm trying with this function_score query

'query' => [
                'function_score' => [
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ],
                    'field_value_factor' => [
                        'field'=>'popularity'
                    ]
                ],
            ],

Mapping & settings:

'settings' => [ 
                'analysis' => [     
                    'filter' =>  [
                        'ngram_filter' => [
                            'type' => 'edge_ngram',
                            'min_gram' => 2,
                            'max_gram' => 20,
                        ]
                    ],
                    'analyzer' => [
                        'ngram_analyzer' => [
                            'type'      => 'custom',
                            "tokenizer" => "standard",
                            'filter'    => ['lowercase', 'ngram_filter'],
                        ]

                    ]
                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ] 

I set to 10 the value of popularity in cancún and when I start I write "ca" the first option is cancún. This work as expected ...

But the problem comes when I try to find other city where the popularity value is 0 like Puerto Vallarta. When I write "Puerto Va" I obtain the following results:

1.-Val d´Aosta 2.-Puerto Lopez 3.-Bristol - VA and many others ... (But not puerto vallarta)

It is important to emphasize that whitout the funtion score and field_value_factor this query works how to expect (return in the first position puerto vallarta.)

I want to add the capacity of boost popular cities with a integer value.

Any suggestion?

Thanks!

回答1:

By default, your field_value_factor will multiply the natural score by the value of the field popularity. So if the value is 0 for Puerto Vallarta then its score will always be 0. It will match but will never be in the first results.

Plus your boost will be linear, it's certainly not what you want since popular cities will completely overwhelm the results list.

You should then use the property modifier of the field value factor doc here.

If you set it to log2p it should work as expected. The modifier log2p will add 2 to the value of the popularity field, before applying a log function. Then the difference in boost between a 2 popularity city and a 4 will be sensible. But the difference will decrease when the popularity score rise

Ex :

popularity 2 => log(4) => 0.6
popularity 4 => log(6) => 0.77
popularity 20 => log(22) => 1.34
popularity 22 => log(24) => 1.38

Add this to your query :

                'field_value_factor' => [
                    'field'=>'popularity',
                    'modifier' => 'log2p' <== add this
                ]