The first part of the diagram is the sources of data. These are databases from transactional systems. It can be in SQL Server or another relational database. It can also be from flat files like CSVs, Excel, XML, and text.
Afterward, you consolidate all the necessary records from the source into a single format called the staging area. For simplicity, you can also implement the staging area in SQL Server.
Then comes the data warehouse SQL Server with a dimensional model. We will discuss how to make one with an example later.
The final part of the diagram is different data marts. A data mart focuses on one aspect of the business, like sales, purchasing, and more. In this article, we will make a data warehouse with one data mart for insurance policy sales.
SQL Server data warehouse needs to be modeled for efficient processing. Next, weâll review its components, focusing on the dimensional model and its role.
Dimensional Model
Here are some key terms used in the dimensional model.
FACT TABLE
A fact table contains all the facts about a business entity or process. Itâs at the center of the schema, surrounded by dimensions. A fact table may be about sales, ticket support, projects, etc. You can implement this as an SQL database table. Columns include the ID keys of dimensions and measures.
Each record in the fact table will determine how detailed a fact table is. There can be several fact tables in a data warehouse defining different business processes in one warehouse. Each of them can share dimensions about location, date, and more.
DIMENSIONS
Dimension categorizes facts and measures in a fact table. For example, the city or region dimension describes a customerâs location in a sales transaction. Other examples of dimensions are customer and product in a sales business. Dimensions also enable users to answer a business question. For example, âHow much did we earn from Product X this month?â In this question, the Product is the dimension of a Sales fact.
Dimension is implemented as a table referenced by the fact table. It includes a primary key and the key description or name, such as a product ID and name. However, more can be defined within a dimension to categorize it and further build a hierarchy. For example, product category and subcategory describe a product.
The primary key of each dimension can be different from the primary key of the source table. This happens when a table of customers from one database is combined with a table of customers from another. Itâs also called a surrogate key.
MEASURES
The measure is a property of the fact table that allows calculation. This can be sum, average, count, minimum, or maximum. For example, you can sum sales amounts to form total sales.
Measures can be additive, non-additive, semi-additive, or calculated. The sales amount is an additive measure. You can sum or average it. But the unit price is non-additive. It may not make sense if you sum it up. Meanwhile, a calculated or computed measure is like its name. The total sales amount, for example, is calculated based on product unit price + tax.
Types of Data Warehouse Schema
The schema defines how information is structured in a data warehouse. Different schema types are designed to balance performance, storage efficiency, and query complexity. The choice of schema depends on the business requirements and data usage. Letâs review the most common types.
STAR SCHEMA
The simplest and the most widely used dimensional model is a star schema. It has the fact table at the center and the dimensions surrounding it. It can also be described as a parent-child table design. The fact table is the parent, while the dimensions are the children. But since itâs so simple, there are no grandchildren.
Common characteristics of star schema
- The fact table is at the center, containing dimension keys (foreign keys) and measures.
- Primary keys in dimension tables are foreign keys in the fact table.
- No dimension table references another dimension table. They are denormalized.
Advantages of star schema
- Simpler queries because of the simple design.
- Easily maintained.
- Faster access to records because of the denormalized dimension table design.