I have a table that tracks inventory data by each individual piece. This is a simplified version of the table (some non-key fields are excluded):
UniqueID,
ProductSKU,
SerialNumber,
OnHandStatus,
Cost,
DateTimeStamp
Every time something happens to a given piece, a new audit record is created. For example, the first time my product ABC gets added to inventory I get a record like this:
1, ABC, 555, OnHand, $500, 01/01/2009 @ 02:05:22
If the cost of ABC serial number 555 changes, I get a new record:
2, ABC, 555, OnHand, $600, 01/02/2009 @ 04:25:11
If the piece is sold, I get yet another record:
3, ABC, 555, Sold, $600, 02/01/2009 @ 5:55:55
If a new piece of ABC is brought in, I get this record:
4, ABC, 888, OnHand, $600, 02/05/2009 @ 9:01:01
I need to be able to get on-hand inventory value for a given set of products at any point in time as fast as possible.
Using my example above, if I wanted to get my inventory value for product ABC as of 01/02/2009, I'd need to select, for each unique Product/SerialNumber combination, the single most recent record prior to 01/03/2009 with a status of "OnHand" and then add up the costs. (I'm not 100% sure what this select statement would look like at this point, but I'm going to experiment a bit).
My questions: is this a good structure for the type of audit table I'm describing? That is, does it lend itself to fast queries if indexed appropriately? (I'm trying to imagine what will happen when this table grows to millions of rows.)
Should I break out historical records into a separate table and only leave the most recent record for each ProductID/SerialNumber combo in the "active" table?
Any feedback/suggestions/comments/links are appreciated.
Thanks!
You'll make your life a lot easier having a table of live data separate from your audit data, it's a very good idea. For normal day-to-day operation you shouldn't need to even look at the audit data so having it the same table as your live data is just going to cause headaches.
The easiest way to manage this would be to put a trigger on the live table so that whenever a record is inserted/deleted/updated it automagically inserts a new record into the audit table.
edit: expanding on Kevin's thoughts on this, I would imagine that regardless of the serial number, all pieces sharing the same SKU would have the same price? If this is the case, having a separate price table is definitely a good idea too.
Not all values are updated at once, why reproduce all the information that is static? I think you should have different tables for serial number, status, and cost. Each of those tables will also have the product ID and the updated date.
This way, you could also easily tell which part of the product has changed. Before, you would have the compare all the fields of the product with all the fields of the product that was saved just before the first one.
You will need to separate your audit data. Keeping current data together with audit data will impact performance as time progresses.
The simplest implementation would be to create a separate database with the same schema as production. Add a datetime stamp to each table in the audit database. Make a composite primary key from the production primary key and the new datetime stamp.
Setup a trigger on the production database so each insert/update in the production database fires off an insert into the audit database. The values inserted into the audit database will be the newly inserted values.
Only use the audit database for audit reporting purposes.
Alternatively, You can also look at creating a data-mart which would be responsible for tracking changes over time. (But that takes a lot of time and effort)
First, a bit of definition (not clinical defs, just my own separation-of-ideas nomenclature):
==========
Initial table: The day-to-day table that you add to and retrieve from.
Audit table: The table holding multiple versions of any record in its related Initial table.
==========
If the business use of an audit table is to be able to tell what a record looked like at any point in time, I'd say it should be constructed identically to the initial table (plus the addition of a unique audit-ID).
If it is more important to know what a field value was at any point in time (as opposed to the record as a whole) is, then try the more abbreviated table-field-value-date approach. Please note that it takes a lot more work to reconstruct the entire record with this approach, so forget it if whole-record retrieval might ever become necessary.
Overall, I think that in most cases, fast performance using the most recent version of a record are more important than performance using audit data. Consequently, I'd suggest creating the audit table identically to the initial table (plus autonumbered surrogate key), and trigger an insert of the same data into the audit table when added to the initial table. This keeps the number of records relatively static in the initial table, and performance doesn't degrade over time.