Solr FieldCollapsing for More Like This queries

2019-06-01 20:24发布

问题:

I want to use a "More Like This" query to find similar documents and collapse those that have the same value for the field 'image'. I tried to use the Field Collapsing parameters however they do not seem to work for "More like this".

Below is a snippet of my code. Can you tell me how to collapse results using the "More Like This" query?

$url = "http://{$host}:{$port}/solr/{$core}/mlt";

$data = [
    'stream.body' => $content,
    'fl' => 'image,content,title,signature',
    'start' => 0,
    'order' => "score desc",
    'wt' => 'json',
    'mlt.fl' => 'content,title',
    // these lines do nothing ---v
    'group' => 'true',
    'group.field' => 'image',
    'group.sort' => 'impressions desc',
    'group.main' => 'true'
];

$curlHandle = curl_init($url);

$options = array (
        CURLOPT_POST => 1,
        CURLOPT_POSTFIELDS => $data
);

curl_setopt_array($curlHandle , $options);
$result = json_decode(curl_exec($curlHandle));

回答1:

General answer

I could not collapse results using Field Collapsing paramaters. However, I was able to achieve the desired result using CollapsingQParserPlugin. The following filter query collapses documents on the field 'image' and selects the one with the highest value for the field 'impressions': {!collapse field=image max=impressions}

Implementation

For some reason I was not able to combine this filter query with my other filter queries under a single key as follows:

$filterQueries = [
    "-signature:{$signature}",
    ...
    "{!collapse field=image max=impressions}"
];
$data = [
    ...
    'fq' => implode(' AND ', $filterQueries),
    ...
];

This produced the error: Query does not implement createWeight

My fix was to do a GET request (instead of a POST, which was done in the question above). With the GET request it is possible to have a key for each filter query: http://solr-url/mtl?...&fq=-signature%3A0&...&fq=%7B!collapse+field%3Dimage+max%3Dimpressions%7D

Below is the php solution for the snippet in the question:

$url = "http://{$host}:{$port}/solr/{$core}/mlt?"; // Note the added question mark

$data = [
    'stream.body' => $content,
    'fl' => 'image,content,title,signature',
    'fq' => $filterQueries,
    'start' => 0,
    'order' => "score desc",
    'wt' => 'json',
    'mlt.fl' => 'content,title'
];

$params = [];
foreach ($data as $key=>$value) {
    if (is_array($value)) {
        foreach ($value as $subvalue) {
            $subvalue = urlencode($subvalue);
            $params[] = "{$key}={$subvalue}";
        }
    } else {
        $value = urlencode($value);
        $params[] = "{$key}={$value}";
    }
}
$url .= implode('&', $params);

$curlHandle = curl_init($url);
$options = array ();
curl_setopt_array($curlHandle , $options);
$result = json_decode(curl_exec($curlHandle));