Is a fact table in normalized or de-normalized for

2020-03-15 01:23发布

问题:

I did a bit R&D on the fact tables, whether they are normalized or de-normalized. I came across some findings which make me confused.

According to Kimball:

Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.

The other finding, which I also I think is ok, by fazalhp at GeekInterview:

The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...

So my question is, are fact tables normalized or de-normalized? If any of these then how & why?

回答1:

From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.

However, dimensional modelling is a methodology unto itself, tailored to:

  • one use case, namely reporting

  • mostly one basic type (pattern) of a query

  • one main user category -- business analyst, or similar

  • row-store RDBMS like Oracle, SQl Server, Postgres ...

  • one independently controlled load/update process (ETL); all other clients are read-only

There are other DW design methodologies out there, like

  • Inmon's -- data structure driven

  • Data Vault -- data structure driven

  • Anchor modelling -- schema evolution driven

The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.



回答2:

Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.

The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.

My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.