Outlining the Role of a Data Warehouse


After completing this lesson, you will be able to:

  • Explain the role of a data warehouse.

Data Warehouse

A data warehouse is a central repository that stores historical data that has been collected from all systems across an organization. Its main purpose is to support analytics.

The key requirements of a data warehouse are:

  • provide the complete history of data across the organization
  • provide a complete view of business performance by combing data from all applications
  • provide a view of data at any level of granularity
  • provide a trusted source of data used by the entire organization

A data warehouse is a core part of a business intelligence (BI) solution and has these main functions:

  • Extract data from all data sources across an organization in real-time or periodically
  • Manage the data in a central repository
  • Provide sophisticated data modeling capabilities

A data warehouse should be capable of extracting data from any data source on any technology platform, regardless of whether it is cloud or on-premise.

During data extraction, data cleansing takes place. Data cleansing is the process of analyzing incoming data and thoroughly checking it for invalid or missing values. Incorrect values are immediately corrected and missing values are added. Invalid or incomplete data can be rejected.

Once data has passed through the cleansing stage it is then combined with related data from other sources. For example, sales data might be combined with delivery and billing data to provide a complete picture of the entire sales process.

Modern data warehouses not only collect historic data but are also able to access live data. This means an up-to-the-second view of business data is supported in conjunction with the possibility to navigate to any point in history.

A data warehouse is part of a continuous cycle in which data is generated by transactional systems and is then collected by the data warehouse. The data warehouse enriches the data with additional calculations, aggregations, conversions before it is consumed by analytical applications.

Data from all sources across the organization is combined in the data warehouse to provide the complete picture of business performance. The analytical applications generate insights which can be used by business leaders to act quickly on opportunities or to respond to risks that are highlighted.

A data warehouse architecture can be described by using the following layers. Click on the video below to learn about these layers.

Recap the key responsibilities of each of the data warehouse layers:

Data Modeling Layer
Build reusable models ready for analytics consumption.
Enrich the raw data with extra information useful for analytics.
Data Storage Layer
Create a trusted source of data for the modeling layer to consume.
Cleanse data, harmonize, check values, and add missing data.
Data Acquisition Layer
Extract data from all source systems periodically or in real-time.
Load data that has changed since the last load.

Building a data warehouse requires a well-trained team of specialists who understand the data sources and the analytical needs of the organization.

There are many questions to ask that will determine how the data warehouse will be built and the type of analytical services it will provide to the business.

Here are some examples of the important questions that need to be answered during the design phase of a data warehouse:


A data warehouse is sometimes mistakenly referred to as a data mart. A data mart is simply a 'mini' data warehouse used to manage only a limited part of an organization's data, such as sales or procurement, and is usually managed departmentally as opposed to centrally. One of the risks of developing decentralized data marts is the uncontrolled growth of data silos that create redundancies and inconsistencies. A data warehouse is a centralized, enterprise-wide resource. One copy of trusted data shared by the entire organization.

Log in to track your progress & complete quizzes