Star vs Snowflake schema in data warehousing?

2019-03-17 01:07发布

问题:

Currently, I've been involved in an warehouse based intelligent transaction analysis banking system featuring customer churn behavior, fraud detection & CRM analysis. We've been using Oracle as the database & it's completely a data warehousing project with data mining algorithms used for analysis.

We have records of about 1000 customers of a bank. For modeling, whether it is better to use the star schema or snowflake schema or constellation schema? I know the basic difference of star and snowflake schema- normalization of dimension table occurs in snowflake (a.k.a. snowflaking) schema which may be problematic for joining in case of large-sized database.

So, which schema would be better for my case? Answers from experienced programmers involved in data warehousing are highly welcomed!

Thanks in advance!

回答1:

In brief, my assumption going into a project like this would be that a star schema would be appropriate. I might modify that if it appeared that a dimension was getting too large to efficiently full scan and the efficiency of queries against it could be meaningfully improved by snowflaking unless that dimension joined to the fact table on a partitioning key (due to difficulties in applying partition pruning on a predicate placed on a snowflaked dimension).