Distinct values in Azure Search Suggestions?

2019-06-27 05:21发布

问题:

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).

I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.

How can I solve this problem ?

回答1:

The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.

See this forum thread for a more in-depth discussion.

Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.



回答2:

I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).

My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:

public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
    var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
    var suggestions = new List<MachineSuggestionDTO>();

    var sp = new SuggestParameters
    {
        UseFuzzyMatching = true,
        Top = 100 // Get maximum result for a chance to reduce search calls.
    };

    // Add searchfields if set
    if (searchFields != null && searchFields.Count() != 0)
    {
        sp.SearchFields = searchFields;
    }

    // Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
    while (suggestions.Count < take)
    {
        if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
        {
            // If no more suggestions is found, we break the while-loop
            break;
        }
    }

    // Since the list might me bigger then the take, we return a narrowed list
    return suggestions.Take(take).ToList();
}

private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
    var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);

    if(response.Results.Count > 0){
        // Fix filter if search is triggered once more
        if (!string.IsNullOrEmpty(sp.Filter))
        {
            sp.Filter += " and ";
        }

        foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
        {
            var d = result.Document;
            suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });

            // Add found UserIdentity to filter
            sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
        }


        // Remove end of filter if it is run once more
        if (sp.Filter.EndsWith(" and "))
        {
            sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
        }
    }            

    // Returns false if no more suggestions is found
    return response.Results.Count > 0;
}


回答3:

 public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
    {
        SuggestParameters sp = new SuggestParameters()
        {
            UseFuzzyMatching = fuzzy,
            Top = 100
        };

        if (highlights)
        {
            sp.HighlightPreTag = "<em>";
            sp.HighlightPostTag = "</em>";
        }

        var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);

        // Convert the suggest query results to a list that can be displayed in the client.
        return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
    }

After getting top 100 and using distinct it works for me.