Why is there no delimited, integer-only cataloging

2019-06-15 00:36发布

问题:

Is there something I'm missing?

What I am trying to create is basically a table of indexes separated by spaces (or whatever type you fancy). I realize that Full Text Search would not be possible on merely int-type data columns because it understands "spaces" as the delimiter to separated data to be indexed across the whole catalog.

I do realize that it does allow me to index varbinary type data, but why not just int data separated by spaces, rather than include integer AND text data to search through. IE, a

 SEARCH * FROM MyTable
 WHERE CONTAINS(indexedcolumn, '1189')

with a full text index/catalog defined for a table that looks like:

 indexedColumn      secondDelimitedIntColumn
 1189               34 34209 1989 3 5

is not possible, but

 SEARCH * FROM MyTable
 WHERE CONTAINS(uniqueColumn, 'a1189')

WOULD work using the full text index on a table with the following columns:

 uniqueColumn secondDelimitedIntColumn
 a1189        b34 b34209 b1989 b3 b5  

so basically executing a CONTAINS() search on any column with a full text index on it will work only if there is some text attached to the integer string.

But my question is asking "Why can't I just used strings of integers separated by spaces, which saves me the step of having to add dummy text just to get trick SQL Server into allowing me to execute a full text search on indexed integer strings?"

Thanks in advance!

回答1:

This isn't really a question. There are no details about the query you are attempting run or the schema you are running it out of. I'm not exactly sure what to tell you here. I might be able to help you if there are some details available. It's more like you have a complaint than a question.

I'm fully aware this should be in the comments section and not answer but I don't have the points for that on overflow. I live on .dba.



回答2:

Updated with XML example, below

Your current design violates 1st normal form.

That, in itself, is okay. Over some years, I've inherited and had to maintain several systems that did so. I don't know why they were built that way. It doesn't really matter. They had to be maintained and the schedule wasn't always such that there was time for refactoring, testing and validation, not to mention doing so for the stack of apps that were built upon them.

Looking back now, though, I can easily spot the one attribute that they all shared. It was the absolute biggest barrier to optimizing and extending these systems: the underlying "relational" database violated 1st normal form. Virtually every technical "gotcha" encountered, virtually every performance problem, it was the root cause. Splitting strings. Creating a faux datatype system to validate them. Creating further delimited attributes to describe them. Creating special rules for each delimited "location" and having to implement an EVAL function in many systems to enforce them. Using dynamic SQL or worse to search it all. It took more "clever" programming to implement what seemed like conceptually simple features than I care to recollect.

Maybe your system is different. Maybe 40+ years of relational database research does not apply to your situation. For your sake, I truly hope so. The only problem is that you're using a relational database in a non-relational way. Just like you can pound screws with a hammer, and you can pull a boat with a motorcycle (don't hit the brakes if you actually get it going), you can create an index (full-text or b-tree) on text that represents integers.

But why would you do any of these things? Why wouldn't you actually store the integers as integers and enjoy type-safety? Why wouldn't you normalize this into two related tables to take advantage of smaller transactions and more indexing options? If you've inherited a system that you can't change, then please say so and people might be able to help with alternatives (TVPs and XML been rightfully mentioned). But I can't see coming into the situation saying that your hammer and motorcycle are broken because they don't drive screws and pull boats very well.

All that said (maybe somebody, somewhere is rethinking an ill-advised design), I've put LIKE to good use when searching delimited strings:

-- Setup demo data
declare @delimitedInts table (
    data varchar(max) not null
)
insert into @delimitedInts select '0,1,2'
insert into @delimitedInts select '1,2,3,4'
insert into @delimitedInts select '5,10'

-- Create a search term
declare @searchTerm int = 2

-- Get all rows that contain the searchTerm
select data
from @delimitedInts
where ',' + data + ',' like '%,' + cast(@searchTerm as varchar(11)) + ',%'

-- Create many search terms
declare @searchTerms table (
    searchTerm int not null primary key
)
insert into @searchTerms select 2
insert into @searchTerms select 3
insert into @searchTerms select 4

-- Get all rows that contain ANY of the searchTerms
select distinct a.data
from @delimitedInts a
    join @searchTerms b on ',' + a.data + ',' like '%,' + cast(b.searchTerm as varchar(11)) + ',%'

-- Get all rows that contain ALL of the searchTerms
select a.data
from @delimitedInts a
    join @searchTerms b on ',' + a.data + ',' like '%,' + cast(b.searchTerm as varchar(11)) + ',%'
group by a.data
having count(*) = (select count(*) from @searchTerms)

Is this too slow for you? Maybe. Have you actually measured it? At least you could get an implementation in place and prove that it works before you optimize it.

Update: XML

I've done a little testing on converting your space-delimited column to an XML column and querying it, including doing so with XML indexes. Unfortunately, you can't put an XML index on a computed column, so I'm using a trigger to keep an XML column automatically updated. Here are some interesting results (note the SQL comments):

-- Create a demo table
create table MyTable (
      ID int not null primary key identity
    , SpaceSeparatedInts varchar(max) not null
    --, ComputedIntsXml as cast('<ints><i>' + replace(SpaceSeparatedInts, ' ', '</i><i>') + '</i></ints>' as xml) persisted -- Can't use XML index
    , IntsXml xml null
)
go
-- Create trigger to update IntsXml
create trigger MyTable_Trigger on MyTable after insert, update as begin
    update m
    set m.IntsXml = cast('<ints><i>' + replace(m.SpaceSeparatedInts, ' ', '</i><i>') + '</i></ints>' as xml)
    from MyTable m
        join inserted i on m.ID = i.ID
end
go
-- Add some demo data
insert into MyTable (SpaceSeparatedInts) select '1'
insert into MyTable (SpaceSeparatedInts) select '1 2'
insert into MyTable (SpaceSeparatedInts) select '2 3 4'
insert into MyTable (SpaceSeparatedInts) select '5 6 7 10'
insert into MyTable (SpaceSeparatedInts) select '100 10 1000'
go

-- Search for the number 10 (and use this same query in subsequent testing, below)
select *
from MyTable
where IntsXml.exist('/ints/i[. = "10"]') = 1
-- This query spends virtually all of its time running an XML Reader and an XPath filter

-- Add a primary xml index
create primary xml index IX_MyTable_IntsXml on MyTable (IntsXml)
-- The query now uses a clustered index scan and clustered index seek on PrimaryXML

-- Add secondary xml index for value
create xml index IX_MyTable_IntsXml_Value on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for value
-- No change

-- Add secondary xml index for path
create xml index IX_MyTable_IntsXml_Path on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for path
-- No change

-- Add secondary xml index for property
create xml index IX_MyTable_IntsXml_Property on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for property
-- The query now replaces the clustered index scan on PrimaryXML with an index seek on SecondaryXML

While it is clearly a different method, is this faster than LIKE? You have to test in your environment. Hopefully this will give you some ideas of how to do so. Please let me know how this works out for you, if it's doable in your shop.



回答3:

I'm not certain I understand what you are looking for either but if you want to store multiple values in a single column, your best bet is going to be to use XML.

See this post for more info on the concept.

Querying XML columns in SQLServer 2005