Time and date dimension in data warehouse

I'm buildind an data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quater but by hours too. Loking at the examples I see that dates tend to be saved in dimension tabels. alt starexample http://etl-tools.info/images/dw_star_schema.jpg

But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficent than using date/time functions in SQL.

What are your opinions/solutions ?

(I'm using Infobright)

标签： data-warehouse infobright

4条回答

相关推荐>>

2楼-- · 2019-03-15 12:44

I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.

And a seperate dimension for time having 86400 records with each second having a record identified by the time key.

In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-03-15 12:49

Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.

0人赞添加讨论(0) 举报

混吃等死

4楼-- · 2019-03-15 13:01

Kimball recommends having separate time- and date dimensions:

design-tip-51-latest-thinking-on-time-dimension-tables

In previous Toolkit books, we have recommended building such a dimension with the minutes or seconds component of time as an offset from midnight of each day, but we have come to realize that the resulting end user applications became too difficult, especially when trying to compute time spans. Also, unlike the calendar day dimension, there are very few descriptive attributes for the specific minute or second within a day. If the enterprise has well defined attributes for time slices within a day, such as shift names, or advertising time slots, an additional time-of-day dimension can be added to the design where this dimension is defined as the number of minutes (or even seconds) past midnight. Thus this time-ofday dimension would either have 1440 records if the grain were minutes or 86,400 records if the grain were seconds.

0人赞添加讨论(0) 举报

仙女界的扛把子

5楼-- · 2019-03-15 13:06

My guess is that it depends on your reporting requirement. If you need need something like

WHERE "Hour" = 10

meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than

WHERE date_part('hour', TimeStamp) = 10

because the date_part() function will be evaluated for every row. You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:

WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15'

which gets awkward when using dimension fields.

Usually, time dimension has a minute resolution, so 1440 rows.

0人赞添加讨论(0) 举报

Time and date dimension in data warehouse

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间