Escaping an ampersand in SQL Server Full-Text Sear

2019-02-24 18:40发布

问题:

I have a very peculiar case. My ASP.NET page calls a stored procedure of ours that performs a Full-Text Search query on our database. Some of the commonly searched strings include an ampersand because a few brands of our products (well-known brands, too) have an & in their name.

It turns out that in a certain case I get no results unless I escape the ampersand (\&), and in a certain other case I get no results only if I escape the ampersand.

I don't know if this is relevant, but (without giving out the brand names) one ends in &b and the other one in &c.

Is it possible that these strings (&b or &c) have some special meaning of their own? And that by escaping them I'm actually passing a special string to T-SQL?

EDIT

Additional info: after further testing, I proved that the error is in the stored procedure itself. Calling it with & or \& yields different results.

I'll try to post selected parts of the stored procedures. I won't post it all, because most of it isn't really relevant.

The vParamBuca parameter is the one that causes the troubles. Values could be 'word&letter' or word\&letter.

SET @ricercaA = '''FORMSOF(INFLECTIONAL,"' +
    REPLACE(LTRIM(RTRIM(@vParamBuca)),' ', '") AND FORMSOF(INFLECTIONAL,"') + '")'''

The variable @ricercaA is then used to create the query string:

[...]
FROM Products AS FT_TBL
LEFT OUTER JOIN CONTAINSTABLE (Products, Sign1, '+ @ricercaA + ') AS ColSign1_0 ON FT_TBL.ID = ColSign1_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, ManufacturerAdditionalText, '+ @ricercaA + ') AS ColManufacturerAdditionalText_0 ON FT_TBL.ID = ColManufacturerAdditionalText_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, ManufacturerForSearch, '+ @ricercaA + ') AS ColManufacturer_0 ON FT_TBL.ID = ColManufacturer_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, TuttaLaRiga, '+ @ricercaA + ') AS ColTuttaLaRiga_0 ON FT_TBL.ID = ColTuttaLaRiga_0.[KEY]
[...]

EDIT 2

Many thanks to @srutzky for pointing me in the right direction! In the meanwhile, I also found a data inconsistency where one of the brands with the & in its name was modified not to have the &, and the other one wasn't modified (bottom line, my current problem is caused by that: a partial fix that was made by someone in the past).

Anyway, back on track. Now I understand that the & character in the CONTAINSTABLE function is treated as a logical AND (non bitwise).

I still need a solution for that. This answer gives a solution that doesn't work for me (the conditions are not the same as mine). How could I perform a CONTAINSTABLE search for a string with an ampersand in it? Preferably without having to transform the ampersand to another safe character?

回答1:

The odd behavior you are seeing is most likely due to the CONTAINS and CONTAINSTABLE functions (both used with SQL Server's Full Text Search feature) using the ampersand ( & ) character as equivalent to the AND operator. The following statement is taken from the documentation for CONTAINS:

The ampersand symbol (&) may be used instead of the AND keyword to represent the AND operator.

There is no mention of there being any escape character for it (and a back-slash isn't typically an escape character in SQL anyway).


UPDATE

Based on the information now provided in "Edit 2" of the Question, and additional research, I would say that you do not need to escape anything. It seems that putting the search phrases in double-quotes (as a result of using FORMSOF) treats the & as either a literal or a word-breaker, depending on the values on both sides of the &. Try the following examples:

DECLARE @Term NVARCHAR(100);

SET @Term = N'bob&sally'; -- 48 rows
--SET @Term = N'bob\&sally'; -- 48 rows
--SET @Term = N'r&f'; -- 4 rows
--SET @Term = N'r\&f'; -- 24 rows

SET @Term = N'FORMSOF(INFLECTIONAL,"' + @Term + '")';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);

The results for bob&sally and bob\&sally are the same, and in both cases bob and sally are separated and never combined into a single exact-match string.

The results between r&f and r\&f, however, are not the same. r&f is only ever treated as a single, exact-match string because r and f alone are not known words. On the other hand, adding in the back-slash separates the two letter since \ is a word-breaker, in which case you get both r and f.

Given that you stated in the Update that you have "data inconsistency, where one of the brands with the "&" in its name was modified not to have the "&", and the other one wasn't", I suspect that when you do not add in the \ character you get the brand that was not modified (since it is an exact match for the full term). But when you do add in the \ character, then you get the brand that was modified to have the & removed, since you are now searching on both pieces, each one matching part of that brand name.

I would fix the data to be consistent: update the brand names that had the & removed to put the ampersands back in. Then when people search using & without the extra \ added, it will be an exact match. This behavior will be consisted across the data, and will not require you adding code to circumvent the natural operation of FTS, which seems to be an error-prone approach.