Data Modeling for EAV

2019-02-27 13:14发布

问题:

How are others using relational modeling tools to map a logical model or one in third normal form to a database that uses EAV?

回答1:

EAV is a non-relational design. You can't achieve any normal forms with EAV, because it fails to be a relation.

EAV is an example of the Inner-Platform Effect antipattern.

If you need many attributes, you could consider serializing to a blob of XML and storing that XML in a CLOB column.



回答2:

In relational it is possible to have an EAV (also called an attribute value pair or a name-value pair). It uses a fairly abstract data model. Before describing it, there are several caveats about it. it is hard to query and it does not perform well. EAV's are often used in relational for audit tables. The EAV stores the before image of a single column (before it changed). It stores one by one the one column that changed. If five attributes changed, five rows would be stored in the EAV. In addition, due to a potentially large natural primary key in EAV, often mirror tables are used instead of EAVs.

A EAV can be created in logical data modeling and in relational. This can be done with three related entities or tables:

  • The base entity (such as customer), which in analogous to a column family.
  • A "type" entity, which describes the attribute and its characteristics such as Net Worth Amount,
  • A "value" entity, which assigns the attribute to an instance of a base entity and gives it a value.

The base entity is the entity that has the varying characteristics.

The "type" entity is simply a code table identified by a type code and containing a description and other domain characteristics. Domain refers to data type, length, meaning, and units of measure, etc. It describes the attribute out of context (i.e., unassigned). An example could be Net Worth Amount, which is a number 8 digits with 2 decimal places, right justified, and its description is "a value representing the total financial value of a customer including liquid and non-liquid amounts".

The "value" entity is an associative entity or table, identified by the customer id and the attribute type code [both foreign keys], that has a value attribute that assigns the Net Worth Amount type the Customer and gives it a value, such as "$2,000,000." The value column is often given a generic name such as Field.

However, in SQL, name-value pairs are somewhat difficult to query and generally do not perform well. Say the "value" entity has an attribute called "Field". This is a character string. SQL typically uses the column name for result set headers. Say the field has Net Worth. But when it is displayed, header will be called "Field", not Net Worth. Field is the actual name of the column. It takes advanced SQL to get the desired header. This issue could be addressed by denormalizing the "type" and "value" entities into one. Instead of having three tables you have two -- one-to-many, a base table and a value table. Actually, that is essentially how Cassandra does it: the columns in a column family are a fully flattened attribute-value pair.Even if you denormalize in relational, the "field" column is still called "Field".

You could also flatten (denormalize) the three entities into one table in relational but there would be redundancy issues: Customer ID[FK], Attribute Type Code[FK], Customer base attribute(s), Attribute Type attributes, Field (for the values). I'm not advising it; I'm just saying.

In the early days of data modeling, data modelers loved to use EAVs because they appeared to be an elegant, flexible solution -- until DBAs got their hands on them. Consequently, EAV is sometimes used for a logical model, but should be fully denormalized in the physical one. When fully denormalized, it has a strong resemblance to Cassandra. I have used EAV in both logical and physical sometimes, with the issues identified above. I don't know how to add a graphic in these comments or I would include an illustration.