Let’s deep dive into Data Warehousing Schema which are useful for structuring data warehouse tables or data marts. In this blog, we’ll explore schema types in Data Warehousing Modelling. Learn about the different types of schemas and their pros and cons. Let’s begin!
Is it possible to derive insights from raw data? Well, yes, however it can get tiresome, and the accuracy level may not be satisfactory. But if we have more clean and organized raw data, the process of extracting insights can get smoother. This is the role of Data Warehousing. It is a simple process that constructs a data warehouse with essential data. We can archive and store the essential data to be used in the future. Extraction, Transformation & Loading (ETL) can transform the very raw data into information.
A Data Warehouse can be understood as a digital location that stores data extracted from various sources, like files and databases. To make business-centric decisions and solve business-centric problems with data, we’d need to mine the raw data. This is possible through the central data repository to avail insights and generate impactful reports. This works based on Online Analytical Processing (OLAP). It is an organization’s location for the storage of archived and historical data. All of the organizational data/information is stored in this one place. This is quite supportive in identifying trends from historical data.
Schema literally means the logical interpretation of the entire database. It connects links among the various database tables through values and keys. Just like a database, a data warehouse also comes with a schema. In Data Warehouse, we use modelling schemas like Star, Snowflake, and Galaxy.
The Data Warehouse Schema is a structure that rationally defines the contents of the Data Warehouse, by facilitating the operations performed on the Data Warehouse and the maintenance activities of the Data Warehouse system, which usually include the detailed description of the databases, tables, views, indexes, and Data, that are regularly structured using predefined design types such as Star Schema, Snowflake Schema, Galaxy Schema (also known as the Fact Constellation Schema), etc. The schema in a data warehouse is used to get knowledge of the complexity of the structure of the data warehouse.
The very basic components of all of the data warehouse schemas are the facts and the dimension tables. The various combinations of these significant elements build most of the designs of data warehouse schemas.
A fact table should have data corresponding data to any business process and it stores quantitative information of analysis, or we can say it contains factual information in a table.
A Dimension table contains keys to facts present in the fact table and their corresponding attributes.
Historically we can identify three prominent Data Warehouse Schema namely, Star Schema, Snowflake Schema and Galaxy Schema. Each of these schemas comes with very unique constraints in its design build-up.
The star schema of a data warehouse is one of the straight and simple designs. This schema allows for specific design parameters like permitting a few single-dimension tables to be joined to the main table or permitting only one central table. A Star Schema can be like a star having five-dimensional tables joined with one table at the centre.
In Star Schema there is one fact table in the middle and a number of associated dimension tables. It is known as the star schema because the entity-relationship diagram of this schema consists of a large fact table, and the points of the star are the dimension tables.
This Data Warehouse Schema, the Snowflake Schema encompasses a very logical arrangement of the dimension tables that we talked about previously. This schema types builds itself over Star Schema by adding more sub-dimensional tables. These tables are related to the first order dimension table which are joined to the fact table.
The snowflake schema describes the logical structure in much more detail as compared to star schema. Snowflake schema is more complex than Star schema but less complex than Galaxy Schema.
In the snowflake schema one fact table has a relationship with other dimension tables but unlike star schema, one or more dimension tables can have relationship with multiple dimension tables.
The goal of the snowflake schema is to normalize the denormalized data of the star schema. The snowflake structure materialized when the dimensions of a star schema are detailed and highly structured, having several levels of relationship, and the child tables have multiple parent tables. It solves some of the common problems associated with the star schema.
The Fact Constellation Schema also known as the Galaxy Data Warehouse Schema can be the next iteration in the Data Warehouse Schema. The Galaxy Schema makes use of the multiple facts table which are connected to the shared dimension tables, unlike the Star Schema or the Snowflake Schema. Galaxy Schema can be imagined as the Star schema which is entirely interlinked and normalized. There is no inconsistency or redundancy in the data.
A Galaxy Schema is also called fact constellation schema. Fact Constellation refers to combination of fact tables and dimension tables using joins. Multiple star schema is connected together to form galaxy schema.
|Star Schema||Snowflake Schema||Galaxy Schema|
|Elements||There are no sub-dimension tables attached here. There is a single fact table which is connected with multiple dimension tables.||Here there is a single fact table that connects with multiple dimension tables which in turn are connected to multiple sub-dimension tables.||Here there are multiple facts tables which are connected with the multiple dimension tables. These again connect with multiple sub-dimension tables.|
|Normalization||Star Schema is completely denormalized.||Snowflake schema is absolutely Normalized.||Galaxy Schema is entirely Normalized.|
|Number of Dimensions||There are multiple dimension tables that can map to the single fact table.||Here there are multiple dimension tables that can map to multiple dimension tables.||Here there are multiple dimension tables that can map to multiple dimension tables.|
|Performance||Here there are very few foreign keys which mostly results in very high performance.||Here there are high number of foreign keys, hence there is a decrease in performance from that at Star Schema.||This one is quite complex to be understood. Hence this schema type is mostly reserved for very complex data structures.|
|Complexity||This schema is designed to be easily understood. It is simple with least complications.||This schema type can be somewhat challenging to be understood. It is quite complex than Star Schema in its design.||This schema type is generally reserved for very complex data structures. It’s quite complex to be understood.|
|Storage Usage||There is data redundancy in this schema type hence it takes up high disk space.||Here data redundancy is limited hence it can take up less disk space.||This one is quite a sophisticated schema type. The data redundancy is very limited here. Hence it takes the least disk space.|
We have been able to know about schemas, their different variations and the unique role they take up in data warehousing modelling. The understanding of schemas from this article should play a key role in enabling us to take more empowered decisions. Schemas help us to understand the business decisions related to the design of the data models. These schemas play a very huge role in turning raw data into meaningful information in the Data Warehouse. So, these are schemas that enable us to define the relation between different sets of data tables. With normalization and the number of fact tables, we are able to know what schema to build.
Please feel free to write to us if you have feedback to share or just want to talk!