I'm using a CONTAINSTABLE query with MS SQL Server's full-text indexing engine to search within a textual column; e.g.:
SELECT *
FROM MyTable
INNER MERGE JOIN CONTAINSTABLE(MyTable, sDescription, 'FORMSOF(INFLECTIONAL, "brains")')
AS TBL1 ON TBL1.[key]=MyTable.ixKey
This does a great job of finding rows with a description including a word like "brains" (e.g. "brain", "brained"). However, when I display these results to the user, I'd like to highlight the word that matched their query (just like Google). But I can't just look for the search term in the results: if the result contains "brain", I obviously can't highlight "brains".
Can SQL Server tell me where in the column (either the word or character) the full-text match occurs? Alternatively, can I manually run the stemmer to get all the forms of the search term? I could highlight each of those individually, then.
SQL Server 2008 includes a function to get the inflected forms of a word or phrase, using the full-text engine's parser: sys.dm_fts_parser
.
SELECT display_term, source_term, occurrence FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "brains")', 1033, 0, 0)
gets a table like:
display_term | source_term | occurrence
---------------------------------------
brain | brains | 1
brains | brains | 1
brained | brained | 1
(Working with query phrases is a bit more work, as it inflects each word separately, but it's not too hard to put things back together.)
Now I can just highlight any occurrence of any of the inflected forms. It's a bit more work than if SQL Server just told me where the FTS matches are, but it'll do.
The value in the result column expansion_type
indicates this.
An expansion type of 2 is INFLECTIONAL, a 4 indicates thesaurus keyword expansion:
FORMSOF(THESAURUS, "Co")
source_term display_term expansion_type
Co co 0
Co company 4
FORMSOF(INFLECTIONAL, "Dog")
source_term display_term expansion_type
Dog dog 0
Dog dogs 2
Dog dogged 2
Dog dogging 2
SQL
SELECT
source_term,
display_term,
expansion_type
FROM sys.dm_fts_parser (FORMSOF(INFLECTIONAL, "Dog"), 1033, 0, 0)
order by source_term, expansion_type