SQL design approach for searching a table with an

2020-02-26 10:17发布

Consider searching a table that contains Apartment Rental Information: A client using the interface selects a number of criteria that are represented as bit fields in the DB, for instance:

  • AllowsPets
  • HasParking
  • HasDeck
  • ModernKitchen

etc..

We are facing a situation where each new client of our software has additional fields they want to allow their end users to search on. The number of bit fields could reach into the hundreds.

I have three approaches that I'm considering and hoping for input and/or a different approach.

  • Current approach: Add more bit fields, sql queries are built dynamically and executed using EXEC: SET @SQL = @SQL + 'l.[NumUnits],' exec(@SQL))

Continue to add more bit fields. (table with 300 columns?)

  • Represent the data as a series of bits in one field. I'm unclear on if this approach will work, consider the 4 sample bit fields I offered above. The field could look like this: 1011 which would indicate false for 'hasparking' but true for all others. What I'm unclear on is how you would structure a query where you didn't care if it was false or true, for instance 1?11 where the person searching needs 1,3 and 4 to be true but doesn't care if 'HasParking' is true or false.

  • Move to an Attribute based approach where you have a table 'AttributeTypeID' and a table PropertyAttributes, which joins the PropertyID to the AttributeTypeId, new bit fields are simply a row in the AttributeTypeID table.

some other approach? Is this a well known SQL design pattern?

Thanks for any help

KM- EDIT PER COMMENTS


attribute table has a few other rows in it and is called listingattributes

CREATE TABLE [dbo].[ListingAttributes](
    [ListingID] [bigint] NOT NULL,
    [AttributeID] [int] IDENTITY(1,1) NOT NULL,
    [AttributeType] [smallint] NOT NULL,
    [BoardID] [int] NOT NULL,
    [ListingMLS] [varchar](30) NOT NULL,
    [PropertyTypeID] [char](3) NOT NULL,
    [StatusID] [varchar](2) NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [AttributeID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]





;WITH GetMatchingAttributes AS
(
SELECT
    ListingID,COUNT(AttributeID) AS CountOfMatches
    FROM ListingAttributes

    WHERE 
    BoardID = 1
    AND
    StatusID IN ('A')
    AND
    --PropertyTypeID in (select * from @PropertyType)
    --AND
    AttributeType IN (2,3,6)
    GROUP BY ListingID
    HAVING COUNT(AttributeID)=(3)
)

SELECT 
    count(l.listingid)  
    FROM Listing l
        INNER JOIN GetMatchingAttributes m ON l.ListingID=m.ListingID
    --   where
    --   StatusID IN (select * from @Status)
    --AND
    --PropertyTypeID in (select * from @PropertyType)     1           1           0           NULL                           NULL                           1                                                                                                                                                                                                                                                                                                                                                                                                                                                                          NULL                                                                                                                                                                                                         1             NULL          NULL          NULL        0.1934759        NULL                                                                                                                                                                                                         NULL     SELECT                                                           0        NULL
  |--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1012],0)))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1           2           1           Compute Scalar                 Compute Scalar                 DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1012],0))                                                                                                                                                                                                                                                                                                                                                                                                                     [Expr1006]=CONVERT_IMPLICIT(int,[Expr1012],0)                                                                                                                                                                1             0             0.001483165   11          0.1934759        [Expr1006]                                                                                                                                                                                                   NULL     PLAN_ROW                                                         0        1
       |--Stream Aggregate(DEFINE:([Expr1012]=Count(*)))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1           3           2           Stream Aggregate               Aggregate                      NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                       [Expr1012]=Count(*)                                                                                                                                                                                          1             0             0.001483165   11          0.1934759        [Expr1012]                                                                                                                                                                                                   NULL     PLAN_ROW                                                         0        1
            |--Filter(WHERE:([Expr1005]=(3)))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   1           4           3           Filter                         Filter                         WHERE:([Expr1005]=(3))                                                                                                                                                                                                                                                                                                                                                                                                                                                     NULL                                                                                                                                                                                                         2471.109      0             0.00440886    9           0.1919928        NULL                                                                                                                                                                                                         NULL     PLAN_ROW                                                         0        1
                 |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      1           5           4           Compute Scalar                 Compute Scalar                 DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0))                                                                                                                                                                                                                                                                                                                                                                                                                     [Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)                                                                                                                                                                9185.126      0             0.01422281    11          0.1875839        [Expr1005]                                                                                                                                                                                                   NULL     PLAN_ROW                                                         0        1
                      |--Stream Aggregate(GROUP BY:(.[dbo].[ListingAttributes].[ListingID]) DEFINE:([Expr1011]=Count(*)))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             1           6           5           Stream Aggregate               Aggregate                      GROUP BY:(.[dbo].[ListingAttributes].[ListingID])                                                                                                                                                                                                                                                                                                                                                                                                                [Expr1011]=Count(*)                                                                                                                                                                                          9185.126      0             0.01422281    11          0.1875839        [Expr1011]                                                                                                                                                                                                   NULL     PLAN_ROW                                                         0        1
                           |--Index Seek(OBJECT:(.[dbo].[ListingAttributes].[_dta_index_ListingAttributes_BoardID_ListingID__AttributeType_PropertyTypeID_StatusID_6_7]), SEEK:(.[dbo].[ListingAttributes].[BoardID]=(1)),  WHERE:(.[dbo].[ListingAttributes].[StatusID]='A' AND (.[dbo].[ListingAttributes].[AttributeType]=(2) OR .[dbo].[ListingAttributes].[AttributeType]=(3) OR .[dbo].[ListingAttributes].[AttributeType]=(6))) ORDERED FORWARD)                                                                                                                             1           7           6           Index Seek                     Index Seek                     OBJECT:(.[dbo].[ListingAttributes].[_dta_index_ListingAttributes_BoardID_ListingID__AttributeType_PropertyTypeID_StatusID_6_7]), SEEK:(.[dbo].[ListingAttributes].[BoardID]=(1)),  WHERE:(.[dbo].[ListingAttributes].[StatusID]='A' AND (.[dbo].[ListingAttributes].[AttributeType]=(2) OR .[dbo].[ListingAttributes].[AttributeType]=(3) OR .[dbo].[ListingAttributes].[AttributeType]=(6))) ORDERED FORWARD  .[dbo].[ListingAttributes].[ListingID], .[dbo].[ListingAttributes].[AttributeID], .[dbo].[ListingAttributes].[AttributeType], .[dbo].[ListingAttributes].[StatusID]  16050.41      0.09677318    0.0315279     26          0.1283011        .[dbo].[ListingAttributes].[ListingID], .[dbo].[ListingAttributes].[AttributeID], .[dbo].[ListingAttributes].[AttributeType], .[dbo].[ListingAttributes].[StatusID]  NULL     PLAN_ROW                                                         0        1

(7 row(s) affected)



5条回答
姐就是有狂的资本
2楼-- · 2020-02-26 10:47

I suggest you go with the second approach, known as Entity-attribute-value model. It's probably the only approach that will scale as you need.

You could also have two searches, the basic and the advanced. You keep the attributes for the basic search in one table, and all the advanced attributes in the other table. This way at least the basic search will remain rapid as the number of attributes will grow with time.

查看更多
家丑人穷心不美
3楼-- · 2020-02-26 10:54

I've walked down this path a few times trying to store health status markers!

When I first started (in 2000?) I tried a character position approach (your #2) and found that it quickly became pretty unwieldy as I wrestled with the same questions over and over: "which position held 'Allows Pets' again?" or, worse yet, "how long is this string now? / which position am I on?" Can you work around this problem - developing objects to manage things for you? Well, yes, to an extent. But I really didn't appreciate how much extra work it cost compared to having the field identities managed for me by the database.

The second time around, I used an attribute/value pair approach similar to your solution #3. This basically worked and, for specialty needs, I still generate attribute/value pairs using a PIVOT. Also, my background is in AI and we used attribute/value pairs all the time in mechanical theorem proving so this was very natural for me.

However, there is a huge problem with this approach: pulling any one fact out ("Show me the apartments that allow pets") is easy but pulling all of the records meeting multiple constraints quickly gets very, very ugly (see my example below).

**SO...**I ended up adding fields to a table. I understand the theoretical reasons that Jon and 'Unknown' and 'New In Town' give for preferring other approaches and I'd have agreed with either or both at one point. But experience is a pretty harsh teacher...

A Couple More Things

First, I disagree that adding more bit fields is a nightmare of maintenance - at least compared with a character-bit approach (your #2). That is, having a distinct field for each attribute ensures that there is no 'management' necessary to figure out which slot belongs to which attribute.

Second, having 300 fields isn't really the problem - any decent database can do that without problem.

Third, your real issue and the source of pain is really the matter of dynamically generating your queries. If you are like me, this question is really all about "Do I really have to have this massive, grody and inelegant chain of "IF" statements to construct a query?"

The answer, unfortunately, is Yes. All three of the approaches you suggest will still boil down to a chain of IF statements.

In a database bit-field approach, you'll end up with a series of IF statements where all of your columns have to be added like so:

string SQL = "Select X,Y,Z Where ";

if (AllowsPets == 0)
  SQL += "(AllowsPets = 0) AND ";
else if (AllowsPets == )
  SQL += "(AllowsPets = 1) AND ";  // Else AllowsPets not in query
.
.
.
SQL = SQL.Substring(SQL.Length - 4);  // Get rid of trailing 'AND' / alternatively append '(1=1)'

In a character-position approach, you'll do the same thing but your "Appends" will add "0", "1" or "_" to your SQL. You'll also, of course, run into the maintenance issues deciding which one is which that I discussed above (enums help but don't completely solve the problem).

As mentioned above, the Attribute-Value approach is actually the worst. You'll have to either create a nasty chain of sub-queries (which surely will cause a stack overflow of some sort with 300 clauses) or you need to have an IF-THEN like this:

// Kill any previously stored selections.
SQLObject.Execute("Delete From SelectedApts Where SessionKey=X");
// Start with your first *known* attr/value and fill a table with the results.
.
.
Logic to pick first known attr/value pair
.
.
SQLObject.Execute("Insert Into SelectedApts Select X as SessionKey, AptID From AttrValue Where AllowsPets=1");

// Now you have the widest set that meets your criteria. Time to whittle it down.
if (HasParking == 1)
  SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=1));
if (AllowsChildren == 0)
  SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=0));
.
.
.
// Perform 2-300 more queries to keep whittling down your set to the actual match.

Now, you may be able to optimize this a bit so you run fewer queries (a PIVOT, sets of subqueries or using the UNION operator) but the fact is that this gets VERY expensive compared to the single query that you can use (but have to build) using the other approaches.

Thus, this is a painful kind of problem no matter what approach you take - there really is no magic that helps you to avoid it. But, having been there before, I would absolutely recommend approach #1.

Update: If you are really focused on pulling straight criteria matches ("All Apartments That Have A, B and C") and don't need other queries (like "...Sum(AllowsPets), Sum(AllowsChildren)..." or "...(AllowsPets=1) OR (AllowsChildren=1)...") then I really like KM's answer the more I look at it. It is very clever and looks likely to be acceptably fast.

查看更多
The star\"
4楼-- · 2020-02-26 11:02

something like this may work for you:

define tables:

CREATE TABLE #Apartments
(
     ApartmentID    int          not null primary key identity(1,1)
    ,ApartmentName  varchar(500) not null
    ,Status         char(1)      not null default ('A') 
    --....
)

CREATE TABLE #AttributeTypes
(
    AttributeType         smallint     not null primary key
    ,AttributeDescription varchar(500) not null
)

CREATE TABLE #Attributes  --boolean attributes, if row exists apartment has this attribute 
(
     ApartmentID     int not null --FK to Apartments.ApartmentID    
    ,AttributeID     int not null primary key identity(1,1)
    ,AttributeType   smallint  not null --fk to AttributeTypes
)

insert sample data:

SET NO COUNT ON
INSERT INTO #Apartments VALUES ('one','A')
INSERT INTO #Apartments VALUES ('two','A')
INSERT INTO #Apartments VALUES ('three','I')
INSERT INTO #Apartments VALUES ('four','I')

INSERT INTO #AttributeTypes VALUES (1,'dishwasher')
INSERT INTO #AttributeTypes VALUES (2,'deck')
INSERT INTO #AttributeTypes VALUES (3,'pool')
INSERT INTO #AttributeTypes VALUES (4,'pets allowed')
INSERT INTO #AttributeTypes VALUES (5,'washer/dryer')
INSERT INTO #AttributeTypes VALUES (6,'Pets Alowed')
INSERT INTO #AttributeTypes VALUES (7,'No Pets')

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,4)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,5)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,6)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,4)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,7)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,4)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,2)
SET NOCOUNT OFF

sample search query:

;WITH GetMatchingAttributes AS
(
SELECT
    ApartmentID,COUNT(AttributeID) AS CountOfMatches
    FROM #Attributes
    WHERE AttributeType IN (1,2,3)  --<<change dynamically or split a CSV string and join in
    GROUP BY ApartmentID
    HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting table
)
SELECT
    a.*
    FROM #Apartments                      a
        INNER JOIN GetMatchingAttributes m ON a.ApartmentID=m.ApartmentID
    WHERE a.Status='A'
    ORDER BY m.CountOfMatches DESC

OUTPUT:

ApartmentID ApartmentName 
----------- --------------
1           one           
2           two           

(2 row(s) affected)

In the search query above, I just included a CSV string of atribute IDs to search for. In reality, you could create a Search stored procedure where you pass in a CSV parameter containing the IDs to search on. You can look at this answer to learn about loop free splitting of that CSV strings into table which you can join to. This would result in not needing to use any dynamic SQL.

EDIT based on the many comments:

if you add a few columns to the #AttributeTypes table you could dynamically build the search page. Here are a few suggestions:

  • Status: "A"ctive "I"nactive
  • ListOrder: can use this to sort by to build the screen
  • ColumnNumber: can help organize fields on the same screen row
  • AttributeGroupID: to group fields, see below
  • etc.

You could make all the fields checkboxes, or add another table called #AttributesGroups, and group some together and use radio buttons. For example, since "Pets Allowed" and "No Pets" are exclusive, add a row in the #AttributesGroups table "Pets". The application would group the attributes in the interface. Attributes in Groups would work the same as regular ungrouped attributes, just collect the selected IDs and pass it in to the search procedure. However, for each group you'll need to have the application include a "no preference" radio button and default it on. This option will not have an attribute ID and is not passed in, since you don't want to consider the attribute.

In my example, I do show an example of a "super attribute" that is in the #Apartments table, "Status". You should only consider major attributes for this table. If you start using these, you may want to alter the CTE to be FROM #Apartments with filtering on these fields and then join to #Attributes. However you will run into issues of Dynamic Search Conditions, so read this article by Erland Sommarskog.

EDIT on latest comments:

here is code to have a list of exclude attributes:

;WITH GetMatchingAttributes AS
(
SELECT
    ApartmentID,COUNT(AttributeID) AS CountOfMatches
    FROM #Attributes
    WHERE AttributeType IN (1,2,3)  --<<change dynamically or split an include CSV string and join in
    GROUP BY ApartmentID
    HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting include table
)
, SomeRemoved AS
(
SELECT
    m.ApartmentID
    FROM GetMatchingAttributes      m
        LEFT OUTER JOIN #Attributes a ON m.ApartmentID=a.ApartmentID 
            AND a.AttributeType IN (5,6)   --<<change dynamically or split an exclude CSV string and join in
    WHERE a.ApartmentID IS NULL
)
SELECT
    a.*
    FROM #Apartments           a
        INNER JOIN SomeRemoved m ON a.ApartmentID=m.ApartmentID
    WHERE a.Status='A'

I don't think I would go this way though. I'd go with the approach I outlined in my previous EDIT above. When include/exclude of an attribute is necessary, I'd just add an attribute for each: "Pets allowed" and "No Pets".

I updated the sample data from the original post to show this.

Run the original query with:

  • (..,..,6,..) to find apartments that allow pets
  • (..,..,7,..) to find apartments where no pets are allowed
  • (..,..,..) if there is no preference.

I think this is the better approach. When combined with the grouping idea and dynamically built search page described in the last edit, I think this would be better and would run faster.

查看更多
Evening l夕情丶
5楼-- · 2020-02-26 11:03

I've never tested this, but what if you were to create a varchar(256) fields that stored all of your flags as one long string of 0's and 1's.

For example,

  • AllowsPets = 1
  • HasParking = 0
  • HasDeck = 1
  • ModernKitchen = 1

would be:

  • PropertyFlags = 1011

and if you were looking for something that AllowsPets and HasDeck, then the search query would look something like this:

WHERE PropertyFlags LIKE '1_1_' (the underscore represents a single wildcard character in the like clause)

this would solve your issues with adding additional columns to the search in the future, but I'm not sure how this would do performance-wise.

has anyone out there tried anything similar to this?

查看更多
你好瞎i
6楼-- · 2020-02-26 11:13

Create a table that stores attributes or search columns based on the apartment. Definitely do not keep adding more bit field columns..nightmare maintenance and nightmare coding. And definitely don't please don't dynamically generate where statements and use exec.

查看更多
登录 后发表回答