I'm buildind an data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quater but by hours too. Loking at the examples I see that dates tend to be saved in dimension tabels. alt starexample http://etl-tools.info/images/dw_star_schema.jpg
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficent than using date/time functions in SQL.
What are your opinions/solutions ?
(I'm using Infobright)
I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.
And a seperate dimension for time having 86400 records with each second having a record identified by the time key.
In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.
Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.
Kimball recommends having separate time- and date dimensions:
design-tip-51-latest-thinking-on-time-dimension-tables
My guess is that it depends on your reporting requirement. If you need need something like
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
because the date_part() function will be evaluated for every row. You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
which gets awkward when using dimension fields.
Usually, time dimension has a minute resolution, so 1440 rows.