Data Warehouse Schema Design: Best Practices & Examples

Let’s deep dive into Data Warehousing Schema which are useful for structuring data warehouse tables or data marts. In this blog, we’ll explore schema types in Data Warehousing Modelling. Learn about the different types of schemas and their pros and cons. Let’s begin!

Is it possible to derive insights from raw data? Well, yes, however it can get tiresome, and the accuracy level may not be satisfactory. But if we have more clean and organized raw data, the process of extracting insights can get smoother. This is the role of Data Warehousing. It is a simple process that constructs a data warehouse with essential data. We can archive and store the essential data to be used in the future. Extraction, Transformation & Loading (ETL) can transform the very raw data into information.

What is Data Warehouse?

A Data Warehouse can be understood as a digital location that stores data extracted from various sources, like files and databases. To make business-centric decisions and solve business-centric problems with data, we’d need to mine the raw data. This is possible through the central data repository to avail insights and generate impactful reports. This works based on Online Analytical Processing (OLAP). It is an organization’s location for the storage of archived and historical data. All of the organizational data/information is stored in this one place. This is quite supportive in identifying trends from historical data.

What is a Schema?

Schema literally means the logical interpretation of the entire database. It connects links among the various database tables through values and keys. Just like a database, a data warehouse also comes with a schema. In Data Warehouse, we use modelling schemas like Star, Snowflake, and Galaxy.

Let’s talk about Data Warehouse Schema

The Data Warehouse Schema is a structure that rationally defines the contents of the Data Warehouse, by facilitating the operations performed on the Data Warehouse and the maintenance activities of the Data Warehouse system, which usually include the detailed description of the databases, tables, views, indexes, and Data, that are regularly structured using predefined design types such as Star Schema, Snowflake Schema, Galaxy Schema (also known as the Fact Constellation Schema), etc. The schema in a data warehouse is used to get knowledge of the complexity of the structure of the data warehouse.

First let’s understand fact & dimension schema

The very basic components of all of the data warehouse schemas are the facts and the dimension tables. The various combinations of these significant elements build most of the designs of data warehouse schemas.

Fact Tables 

A fact table should have data corresponding data to any business process and it stores quantitative information of analysis, or we can say it contains factual information in a table. 

Dimension Tables 

A Dimension table contains keys to facts present in the fact table and their corresponding attributes.

Now let’s get to the point. Let us understand the types of Data Warehouse Schema

Historically we can identify three prominent Data Warehouse Schema namely, Star Schema, Snowflake Schema and Galaxy Schema. Each of these schemas comes with very unique constraints in its design build-up.

1. Star Schema

The star schema of a data warehouse is one of the straight and simple designs. This schema allows for specific design parameters like permitting a few single-dimension tables to be joined to the main table or permitting only one central table. A Star Schema can be like a star having five-dimensional tables joined with one table at the centre.

In Star Schema there is one fact table in the middle and a number of associated dimension tables. It is known as the star schema because the entity-relationship diagram of this schema consists of a large fact table, and the points of the star are the dimension tables.

The following diagram shows the sales data with respect to the four dimensions, Dealer, Date, Branch, and Product.

In Star Schema each dimension has only one dimension table and each table holds a set of attributes. The dimension table should be joined to a fact table. 

For example, the Branch table contains the attribute set (Branch_ID, Name, Address, Country).

Benefits of Star Schema

Queries use very simple joints while retrieving the data and therefore query performance Is increased.
It is simple to retrieve data for reporting, at any point in time and for any period.

Disadvantages of Star Schema

If there are rapid changes in the requirements, the existing star schema is not recommended to be modified and reused in the long run.
Data redundancy is there in Star Schema because it’s only one dimension table of a fact table that’s why tables are not hierarchically divided.

2. Snowflake Schema

This Data Warehouse Schema, the Snowflake Schema encompasses a very logical arrangement of the dimension tables that we talked about previously. This schema types builds itself over Star Schema by adding more sub-dimensional tables. These tables are related to the first order dimension table which are joined to the fact table.

The snowflake schema describes the logical structure in much more detail as compared to star schema. Snowflake schema is more complex than Star schema but less complex than Galaxy Schema.

In the snowflake schema one fact table has a relationship with other dimension tables but unlike star schema, one or more dimension tables can have relationship with multiple dimension tables.

The dimension Tables are turned into more than one table until they are completely normalized.

Below is the diagram of Snowflake schema.

The goal of the snowflake schema is to normalize the denormalized data of the star schema. The snowflake structure materialized when the dimensions of a star schema are detailed and highly structured, having several levels of relationship, and the child tables have multiple parent tables. It solves some of the common problems associated with the star schema.

Benefits of Snowflake Schema

Data redundancy is completely removed by creating multiple level dimension table.
When comparing with star schema it takes less storage space as compare to star schema.
Snowflake schema is easy to update or maintain as compare the star schema

Disadvantages of Snowflake Schema

Due to normalized dimension tables, the ETL system has to load the number of tables.
You may need complex joins to perform a query due to the number of tables added. Therefore, query performance will be impacted.

3. Galaxy Schema

The Fact Constellation Schema also known as the Galaxy Data Warehouse Schema can be the next iteration in the Data Warehouse Schema. The Galaxy Schema makes use of the multiple facts table which are connected to the shared dimension tables, unlike the Star Schema or the Snowflake Schema. Galaxy Schema can be imagined as the Star schema which is entirely interlinked and normalized. There is no inconsistency or redundancy in the data.

A Galaxy Schema is also called fact constellation schema. Fact Constellation refers to combination of fact tables and dimension tables using joins. Multiple star schema is connected together to form galaxy schema.

Galaxy Schema contains multiple fact table that share dimension tables between them and the dimension table can also make relationship with other fact and dimension tables.

Below is the diagram of Galaxy Schema.

Benefits of Galaxy Schema

Highly flexible.
There are no data redundancy found in galaxy schema.
Low memory/space required.

Disadvantages of Galaxy Schema

Galaxy schema has Complicated design that’s why to create, implement and maintain galaxy schema is a tough job.
More complex queries are required for data mining operations because of higher number of joins used to connect fact and dimension tables.
Data analysis is difficult because of complex structure.

Let’s Understand the Key differences amongst the Star, Snowflake & Galaxy Schema types.

Let’s compare the three based on certain basics and understand how are they different. This can help us make more informed decisions when it comes to Data Warehousing.

	Star Schema	Snowflake Schema	Galaxy Schema
Elements	There are no sub-dimension tables attached here. There is a single fact table which is connected with multiple dimension tables.	Here there is a single fact table that connects with multiple dimension tables which in turn are connected to multiple sub-dimension tables.	Here there are multiple facts tables which are connected with the multiple dimension tables. These again connect with multiple sub-dimension tables.
Normalization	Star Schema is completely denormalized.	Snowflake schema is absolutely Normalized.	Galaxy Schema is entirely Normalized.
Number of Dimensions	There are multiple dimension tables that can map to the single fact table.	Here there are multiple dimension tables that can map to multiple dimension tables.	Here there are multiple dimension tables that can map to multiple dimension tables.
Performance	Here there are very few foreign keys which mostly results in very high performance.	Here there are high number of foreign keys, hence there is a decrease in performance from that at Star Schema.	This one is quite complex to be understood. Hence this schema type is mostly reserved for very complex data structures.
Complexity	This schema is designed to be easily understood. It is simple with least complications.	This schema type can be somewhat challenging to be understood. It is quite complex than Star Schema in its design.	This schema type is generally reserved for very complex data structures. It’s quite complex to be understood.
Storage Usage	There is data redundancy in this schema type hence it takes up high disk space.	Here data redundancy is limited hence it can take up less disk space.	This one is quite a sophisticated schema type. The data redundancy is very limited here. Hence it takes the least disk space.

Conclusion

We have been able to know about schemas, their different variations and the unique role they take up in data warehousing modelling. The understanding of schemas from this article should play a key role in enabling us to take more empowered decisions. Schemas help us to understand the business decisions related to the design of the data models. These schemas play a very huge role in turning raw data into meaningful information in the Data Warehouse. So, these are schemas that enable us to define the relation between different sets of data tables. With normalization and the number of fact tables, we are able to know what schema to build.

Please feel free to write to us if you have feedback to share or just want to talk!

Vikas Agarwal

Vikas Agarwal is the Founder of GrowExx, a Digital Product Development Company specializing in Product Engineering, Data Engineering, Business Intelligence, Web and Mobile Applications. His expertise lies in Technology Innovation, Product Management, Building & nurturing strong and self-managed high-performing Agile teams.