I have a question related to database design. The database that I'm working with
requires data to treated in some way that it is never physically deleted. We started going
down a path of adding a "DeleteDateTime" column to some tables, that is NULL by default but
once stamped would mark a record as deleted.
This gives us the ability archive our data easily but I still feel in the dark on a few areas, specifically
whether this would be considered in line with best practices and also how to go about indexing these tables efficiently.
I'll give you an example: We have a table called "Courses" with a composite primary key made up of the columns "SiteID" and "CourseID".
This table also has a column called "DeleteDateTime" that is used in accordance with my description above.
I can't use the SQL Server 2008 filtered view feature because we have to be
SQL Server 2005 compatible. Should I include "DeleteDateTime" in the clustered index for this table? If so should it be
the first column in the index (i.e. "DeleteDateTime, SiteID, CourseID")...
Does anyone have any reasons why I should or shouldn't follow this approach?
Thanks!
Is there a chance you could transfer those "dead" records into a separate table? E.g. for your Courses
table, have a Courses_deleted
table or something like that, with an identical structure.
When you "delete" a record, you basically just move it to the "dead table". That way, the index on your actual, current data stays small and zippy....
If you need to have an aggregate view, you can always define a Courses_View
which unions the two tables together.
Your clustered index on your real table should be as small, static and constant and possible, so I would definitely NOT recommend putting such a date time column into it. Not a good idea.
For excellent info on how to choose a good clustering key, and what it takes, check out Kimberly Tripp's blog entries:
- GUIDs as PRIMARY KEYs and/or the clustering key
- The Clustered Index Debate Continues...
- Ever-increasing clustering key - the Clustered Index Debate..........again!
Marc
what's your requirements on data retention? have you looked into an audit log instead of keeping all non-current data in the database?
I think you have it right on the head for the composite indexes including your "DeleteDateTime" column.
I would create a view that is basically
select {List all columns except the delete flag}
from mytable
where deletflag is null
This is what I would use for all my queries on the table. The reason why is to prevent people from forgetting to consider the deleted flag. SQL Server 2005 can easily handle this kind of view and it is necessary if you are goin to use thisdesign for delting records. I would have a separate index on the delted column. I likely would not make it part of the clustered index.