Once the high level design of the data warehouse is complete (layers), we then choose a data modeling approach. One of the most important considerations is how you will handle historization of data, especially of attributes that change either quickly or slowly over time, such as a product prices, or customer discount codes.
There are several choices of data model. These include:
- Normalized models (3NF / 2NF)
- Denormalized models (dimensional)
- Data Vault
One of the goals of data modeling is to balance the needs of the business users who requires unrestricted access to all data. This means access to all history and at any level of detail with good performance. But at the same time we must also consider how to store the data efficiently to avoid huge storage costs and to maintain good performance of the data not only for consumption (reporting) but also for data loading and housekeeping.
Components of the Data
Each of the data models uses database tables to store the data. But each model uses the tables in a different way.

There are three data elements to consider:
- Business Keys
- Relationships
- Context (descriptive) information
Each of these elements is stored in tables. Some data models combine the elements into a table and some models use separate tables for each element.
In Third Normal Form (3NF) modeling, each table can contains all three elements. Some tables contain two elements.
In Dimensional modeling, dimension tables always include business keys and context (descriptive) information, while fact tables include relationships, business keys, and in some cases context (descriptive) information.
Data Vault breaks things into their components; it decomposes data.
Data Vault models the component parts into:
- Hubs, which hold the business key but no context (descriptive) data.
- Links, which represent the business relationships, but no context (descriptive) data. A link connects two or more hubs.
- Satellites, which contain all context (descriptive) data and history. A satellite must have only one connection, which can be to a hub or a link.
For more information, see Hans Hultgren, Modeling the Agile Data Warehouse with Data Vault, November 2012.
Example: 3NF Model
Here is an example of a 3NF model:

The 3NF is a strictly defined database normalization rule. Beside 3NF, there is a number of other normal forms (such as 1NF, 2NF, Boyce-Codd normal form) that come with different formal rules. All normal forms have in common that they are supposed to reduce data redundancy and improve data integrity.
Essentially, the 3NF approach enforces the following criteria:
- Identify each set of related data with a primary key
- Eliminate repeating groups in individual tables
- Create a separate table for each set of related data
- No partial dependencies (values depend on the whole of every candidate key)
- No transitive dependencies (values depend only on candidate keys)
Example: Dimensional Model
Here is an example of a Dimensional model:

In dimensional modeling, data is separated into facts (measures) and dimensions (context for the measures). Conventionally, facts are numeric values while dimensions are groups of hierarchies and descriptors that define the facts. For example, the product price is a fact and the product description is a dimension. The key benefit of dimensional models is that they are relatively easy to understand for business users, because the structure is divided into measurements, facts and context, and dimensions. Also, the retrieval of data from the data warehouse tends to operate very quickly. These models are also referred to as star schema models.
Data Vault Method
Here is an example of a Data Vault model:

The main idea of the Data Vault is the explicit separation of structural information (in hubs and links) and descriptive context information/attributes (in satellites). It was developed by Dan Linsted and first described in a paper in 2000. Essentially, the Data Vault is a hybrid of the 3NF and the dimensional approach. It aims to remove the limitations of both models. Some regard Data Vault as too complex, but others consider Data Vault to be the ideal solution to cater for frequently changing models, because changes to the model do not destabilize the existing structure. Data Vault is also favored by those who requires strong auditability of data because it manages historical changes extremely well.
The main advantages of the Data Vault approach are:
- Flexibility of the data model in case of extensions
- Full uni-temporal historization
- Auditability of data
- No dependence between data load processes
- Strong schematization of data load processes/utilization of ETL templates
- High parallelization of data load processes
Data Vault - Hub Tables
In this course we will implement the Data Vault Model. So let's learn more about this model by breaking down each component, starting with the Hub tables.

A hub table represents a core object of the business logic and is used to store a business key, which can also consist of several keys if required. Other descriptive information is not modeled in the hub. A hub does not contain foreign keys.
Formal rules for business keys in hub tables:
- Business keys should be natural keys used by the business (for example, national insurance number).
- Business keys should stand alone and have meaning to the business.
- Business keys should never change. They should have the same semantic meaning and the same granularity.
Focusing on business keys (instead of focusing on source system surrogates) ensures that the result serves the needs of the business.
Data Vault - Link Tables
Now we move to the Link tables.

A link models the relationship between two or more hubs, in other words, the business relationship between two or more business objects. The relationship must always be unique and specific to the business rules. A relationship is always n:m.
Link table candidates are:
- Relationships/associations
- Foreign keys in OLTP systems
- Hierarchies and redefinitions
- Hierarchical relationships, which can be modeled by one link and two connections to hubs: HAL (parent-child LINK) and SAL (same-as LINK)
Transactions and events are often modeled as links (could also be a hub). For example, sales order or sensor data.
The modeling solution depends on the requirements, context, and so on.
Data Vault - Satellite Tables
And finally we come to the Satellite tables.

A satellite contains all non-key/context attributes and is connected to exactly one hub or link. Hub or link tables can have several satellite tables, organized for example by source system. They can contain in the extreme case one column only or any number of columns (and the mandatory columns). Note that in this course we work without historicized data, so, the satellites shown only represent present data.
The criteria to design satellite tables (that is, to separate data into different satellite tables) are:
- Source system
- Rate of change
- Data types (such as separate CLOBS or other lengthy textual fields)
- Attributes which change together (temporal normalization), such as address satellite
- Attributes which are queried together
- Privacy and data protection aspects
In our course, we will implement the data vault model. The data vault model is not supported by SAP BW/4HANA or SAP Datasphere so this could be a valid reason to choose a custom data warehouse approach.