Tuesday, June 9, 2009

Star schema vs Snowflake Schema

What is star schema? Star schema is data warehousing data model which resembles to star. There are one or more fact table connected with multiple dimensional tables. Center of the star consist of one or more fact table and fact is pointing to different dimension tables. Dimension tables have a simple primary key, while fact tables have a composit primary key consisting of the aggregate of relevant dimension keys.

Fact table consists of Measurements or facts of business process. It is centeralized table in star schema, called FACT. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact table contains the content of the Datawarehouse. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables.

Factless Fact means only the key available in the Fact and there are no measures available. Factless fact can have only keys or keys with count of occurrences/ events (For instance, no of accident in a month, no of policies has been closed in a month). It is used to support negative analysis report. For example a Store that did not sell a product for a given period. It is used to join the dimension tables.

Dimension table is a parent table which is connected with Fact table. Dimension has attributes which are normally descriptive and textual values. For instance, SALES table is fact table and possible dimension tables are TIME, PRODUCT, REGION, SALESPERSON, etc. Dimension never have foreign key in star schema data model.

Slowly changing dimension(SCD) applies to cases where the attribute for a record varies over time. We have three type of SCD, Type1, Type2 and Type3.

Type1: The new record replaces the original record. No trace of the old record exists

Type2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.

Type3: The original record is modified to reflect the change

To learn more about SCD, please click SCD

Here is sample data model for star schema. In the below data model, the dimensions are products, customers, time and locations. The fact table is sales and it is connected with all dimension. Now we can see the fact data with different dimensions. The fact table Foreign key will be connected with primary key of dimension table. We never connect the one dimension table to another dimension table. Oracle optimizer understand the star query and generate the different execution plan to improve the performance.

What is snowflake schema? The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

Here is sample data model for snowflake schema.. Here customer dimension is normalized into another lookup table customertype. The same way, we can also normalize the customer dimension into another lookup table as country. Location dimension is normalized into territory. Product dimension can also be normalized into another lookup table as supplier...

Advantages of Snowflake...

1. No redundancy and hence more easy to maintain and change.
2. When we normalize the dimension, the dimension will not be sparsed.
3. It reduce the disk space since there is no redundancy. But in real world scenario, dimension is small table and disk space is not a major issue.

Disadvantages of Snowflake...

1. You need to join more tables when we write the query. It is less easy to understand.
2. In snowflake model, query process time will increase, since we are joining multiple tables.

When should we go for snowflake schema? Dimension has many columns, and the data are very sparse, most of the fields has no data, also disk space is a concern, then we can think of snowflake. Since dimension tables hold less space, snow flake schema approach may be avoided in some organization. We can go for snowflake model when we have small data mart or datawarehouse. Star schema model is good for bigger data mart/warehouse. To learn more about snowflake schema, please click.

To learn more about dimensional data modeling, please see these links. Link1 and Link2

No comments: