Implementing and indexing User Defined Fields in a

2020-05-24 21:19发布

问题:

I need to store a large table (several millions or rows) that contains a large number of user-defined fields (not known at compile time, but probably around 20 to 40 custom fields). It is very important (performance-wise) for me to be able to query the data based on those custom fields: i.e. "Select the rows where this attribute has that value, that attribute is that value, etc.". Each query has some 20 to 30 WHERE clauses.

My ideas so far:

  1. Change the database schema everytime a new user field is implemented. Keep each user defined field as a column in the table. Add and maintain indexes on each custom-created column. How to properly build those indexes is a big problem, as I don't know what attributes(columns) will be used in the WHERE queries.

  2. Store the custom fields as an XML type column. As I understand from SQL2005 I can query inside the XML in the XML type columns. Not so sure about performance though.

  3. Entity Attribute Value . This is what I am using now, but it's a pain.

Any suggestions?

Edit: Some clarifications on my requirements. I have a table, 40 -50 million rows of (say) ID numbers and various attributes associated with those IDs.

Let's say 20 million of them have "CustomAttribute1" equal to 2, then 5 million have "CustomAttribute2" equal to "Yes" and 3 million have "CustomAttribute20" equal to 'No'

I need a FAST method of returning all IDs where:
     1. CustomAttribute1  = 2
     2. CustomAttribute2  = 'Yes'
     3. CustomAttribute4  = null
     4. CustomAttribute20  != 'No'  
  etc...

We have this implemented as EAV: the select query is a nightmare to implement and maintain, it takes a long time to return result, and most anoyingly the DB scales to huge sizes even for small ammounts of data, which is weird since the EAV is essentially normalizing the data but I assume all the indexes take up a bunch of space.

回答1:

It seems like you've listed your available options. EAV can be a pain for querying (and slow, depending on how many criteria you want to search on simultaneously), but it tends to be the most "sane" and RDBMS-agnostic implementation.

Modifying the schema is a no-no...obviously it can be done, but such a practice is abhorrent. I do not approve.

The XML option is a solution, and SQL Server can query inside the structure. I'm not certain about other RDBMS's, and you don't list which one you're using in the post or the tags.

If you're going to be querying on many attributes (say, 20+) simultaneously, then I would probably recommend the XML solution just to limit the number of joins you'll have to make. Aside from that, I would stick with EAV.



回答2:

You could represent all of the user defined fields with an XML Column, e.g.

"But I am not sure what the performance impact of doing this would be, however it is definitely the prettiest way of handling UDF's in a database in my opinion."

   <UDF>
      <Field Name="ConferenceAddress" DBType="NVarChar" Size="255">Some Address</Field>
      <Field Name="ConferenceCity" DBType="NVarChar" Size="255">Some City</Field>
      ...etc
   </UDF>

Then what I would do is put a trigger on the table so that when the column is updated it recreates a view for the table which pulls out the xml values as columns on the view. Lock the view etc during the recreation of it to prevent access errors application side.

Then I would create a stored procedure for updating the XML so that it would work for any XML Column following your User Defined Field xml formatting, e.g. Insert/Update/Remove/Get

GetUDFFieldValue AddUDFField UpdateUDFField DeleteUDFField

--Shared Parameters TableName ColumnName (e.g. use Dynamic SQL to get the XML from X table by X Column Name to make it universal/generic to all of your UDF Fields)

Here is an article on XML Performance Optimization from Sql Server 2005 (not seeing an equivalent in newer versions):

http://technet.microsoft.com/en-us/library/ms345118(v=sql.90).aspx

Lastly:

Are you sure you even need an RDBMS? NoSql Is a better fit for User Generated Fields, I might even consider using Both NoSql and Sql Server.