I have set of records which I need to search using criteria. But criteria is returning me multiple rows.
So I need top 2 records which are having maximum percentage of criteria matching.
I worked on fuzzy logic but found that it is too complex for such simple problems. I have scenarios like below:
SELECT DISTINCT FirstName, LastName, CountryName, StateName FROM Employee
Say for example above one is returning me 5 records.
What I want is like use "like" operator thru which I can find that statename like '%Gujarat%' & countryname like '%India%' matching percentage with above five records.
Once I got this matching percentage, I will select top 2 records with highest amount of matching percentage.
This will lead me to get somewhat accurate data.
Any idea using sql server?
As far as I understand you need something like Fuzzy String Matching using Levenshtein Distance Algorithm. Hope the link will be helpful.
You need to calculate distance between CountryName and search pattern. It's not exactly the "percentage", but it can measure the relevance.
Maybe this solves your problem?
SELECT TOP 2 FirstName, LastName, CountryName, StateName
FROM Employee
WHERE
statename like '%Gujarat%' AND countryname like '%India%'
ORDER BY
dbo.edit_distance(statename, 'Gujarat') + dbo.edit_distance(CountryName, 'India') DESC
You could use Full text search. Using ContainsTable you can get a RANK for each record describing how weel it fit the search pattern. Then you can order your results by that rank and then use select top N
to get only the best N results.
Implementing full text search is easy and fast, specially if you need simple queries like yours.
Resources:
- Implementing full text search and basic usage.
- Part 3 of a series, focused on ranked queries with
containstable
and freetexttable
.
- ContainsTable reference. Also you can find a lot of info about this here on stackoverflow.
Hope it helps.
Given solutions not worked for me,
So I created my own logic:
SELECT TOP 2 FirstName, LastName, CountryName, StateName
FROM Employee
WHERE
statename like '%Gujarat%' AND countryname like '%India%'
ORDER BY
LEN(StateName + CountryName) - LEN(REPLACE(StateName, 'Gujarat', '') + REPLACE(CountryName, 'India', '')) DESC
Hope this help...