What are the best practices for modeling inheritance in databases?
What are the trade-offs (e.g. queriability)?
(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)
What are the best practices for modeling inheritance in databases?
What are the trade-offs (e.g. queriability)?
(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)
There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:
Table-Per-Type (TPT)
Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.
So for example:
Would result in tables like:
Table-Per-Hierarchy (TPH)
There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.
Given the classes above, you end up with this table:
For any rows which are rowtype 0 (Person), the startdate will always be null.
Table-Per-Concrete (TPC)
Each class has its own fully formed table with no references off to any other tables.
Given the classes above, you end up with these tables:
Note that some database engines already provides inheritance mechanisms natively like Postgres. Look at the documentation.
For an example, you would query the Person/Employee system described in a response above like this:
In that is your database's choice, you don't need to be particularly smart !
You would normalize of your database and that would actually mirror your inheritance. It might have performance degradance, but that's how it is with normalizing. You probably will have to use good common sense to find the balance.
Using SQL ALchemy (Python ORM), you can do two types of inheritance.
The one I've had experience is using a singe-table, and having a discriminant column. For instances, a Sheep database (no joke!) stored all Sheep in the one table, and Rams and Ewes were handled using a gender column in that table.
Thus, you can query for all Sheep, and get all Sheep. Or you can query by Ram only, and it will only get Rams. You can also do things like have a relation that can only be a Ram (ie, the Sire of a Sheep), and so on.
Short answer: you don't.
If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.
If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.
TPT, TPH and TPC patterns are the ways you go, as mentioned by Brad Wilson. But couple of notes:
child classes inheriting from a base class can be seen as weak-entities to the base class definition in the database, meaning they are dependent to their base-class and cannot exist without it. I've seen number of times, that unique IDs are stored for each and every child table while also keeping the FK to the parent table. One FK is just enough and its even better to have on-delete cascade enable for the FK-relation between the child and base tables.
In TPT, by only seeing the base table records, you're not able to find which child class the record is representing. This is sometimes needed, when you want to load a list of all records (without doing
select
on each and every child table). One way to handle this, is to have one column representing the type of the child class (similar to the rowType field in the TPH), so mixing the TPT and TPH somehow.Say we want to design a database that holds the following shape class diagram:
The database design for the above classes can be like this: