I am a newcomer to data warehouses and have what I hope is an easy question about building a star schema:
If I have a fact table where a fact record naturally has a one-to-many relationship with a single dimension, how can a star schema be modeled to support this? For example:
- Fact Table: Point of Sale entry (the measurement is DollarAmount)
- Dimension Table: Promotions (these are sales promotions in effect when a sale was made)
The situation is that I want a single Point Of Sale entry to be associated with multiple different Promotions. These Promotions cannot be their own dimensions as there are many many many promotions.
How do I do this?
For cases when you truly have a "multi-valued" dimension, a Bridge Table is usually the solution that Kimball recommends.
Your "Promotion" dimension simply is a record of each promotion, with its attributes (start date, end date, coupon code, POS promo code, Ad Name, etc). The relationship from promo to product isn't modeled here, since it will be reflected in the fact table.
Promotion/Discount Dimension would look like (1 row per unique planned promotion)
Your Sales Fact would look like:
Your "Promotion Group" bridge table would then be the set of combinations:
If a sale occurs that has 3 promotions on it, you simply create group ID that relates to each promo, then put the group ID on the fact table. It's very similar to the way that medical reporting systems deal with multiple diagnoses.
Note that by using a Bridge table, you can easily double count sales, so I advise that reports using this method be developed by folks that understand the model.
You should load a fact record for each promotion, even if the dollar amount is the same. If in fact, each type of promotion in your example is truly represented by this specific dollar amount, then a fact record should be loaded with the key of the promotion type, also containing keys back to other related dimensions (including Date).
The main point here is don't worry about data duplication. Think about a sales-oriented Data Warehouse, for say, a fast food company. One can assume there won't be just one fact record for $4.13, which is used to represent a million distinct sales of "value meal #3". Instead, each record in the "Transaction" dimension would have a relationship with at least one specific fact record in this hypothetical Sales fact table.
Time is almost always a dimension in a star schema.
"In effect" suggests that there is a start and end date for a Promotion.
So a Promotion might itself be a fact that has a start and end date reference to the Time dimension.
Maybe with a model like this you could have a JOIN table to relate Sale to Promotion in a many-to-many fashion between facts.
"Many, many" Promotions - yes, but how large is that? One per day means 365 records per year. I'll assume that Promotions are associated somehow with Products or Categories. A Sale would have a timestamp and multiple Products.
You have to store them somewhere, sometime or your model falls apart. Why the reluctance to model Promotion that way?
My advice would be to not worry about the size of the data and concentrate on modeling the problem as best you can. Get the logical model right first, then worry about the physical model and the data sizes.