how to select record whose matching percentage is

2019-02-27 02:46发布

问题:

I have set of records which I need to search using criteria. But criteria is returning me multiple rows.

So I need top 2 records which are having maximum percentage of criteria matching.

I worked on fuzzy logic but found that it is too complex for such simple problems. I have scenarios like below:

SELECT DISTINCT FirstName, LastName, CountryName, StateName FROM Employee

Say for example above one is returning me 5 records.

What I want is like use "like" operator thru which I can find that statename like '%Gujarat%' & countryname like '%India%' matching percentage with above five records.

Once I got this matching percentage, I will select top 2 records with highest amount of matching percentage.

This will lead me to get somewhat accurate data.

Any idea using sql server?

回答1:

As far as I understand you need something like Fuzzy String Matching using Levenshtein Distance Algorithm. Hope the link will be helpful.

You need to calculate distance between CountryName and search pattern. It's not exactly the "percentage", but it can measure the relevance.

Maybe this solves your problem?

SELECT TOP 2 FirstName, LastName, CountryName, StateName 
FROM Employee
WHERE
    statename like '%Gujarat%' AND countryname like '%India%'
ORDER BY
    dbo.edit_distance(statename, 'Gujarat') + dbo.edit_distance(CountryName, 'India') DESC


回答2:

You could use Full text search. Using ContainsTable you can get a RANK for each record describing how weel it fit the search pattern. Then you can order your results by that rank and then use select top N to get only the best N results.

Implementing full text search is easy and fast, specially if you need simple queries like yours.

Resources:

  • Implementing full text search and basic usage.
  • Part 3 of a series, focused on ranked queries with containstable and freetexttable.
  • ContainsTable reference. Also you can find a lot of info about this here on stackoverflow.

Hope it helps.



回答3:

Given solutions not worked for me,

So I created my own logic:

SELECT TOP 2 FirstName, LastName, CountryName, StateName 
FROM Employee
WHERE
    statename like '%Gujarat%' AND countryname like '%India%'
ORDER BY
    LEN(StateName + CountryName) - LEN(REPLACE(StateName, 'Gujarat', '') + REPLACE(CountryName, 'India', '')) DESC

Hope this help...