Implementing the Data Warehouse Architecture in SAP HANA

Objective

After completing this lesson, you will be able to create an SAP HANA project structure based on the reference data warehouse architecture

Introduction

Launch the video below to get started.

Implement the Reference Data Warehouse Architecture

The Basic Layers

In the previous unit we introduced an example of a reference SAP SQL Data Warehouse architecture. Now, in this unit, you will see how the architecture is implemented in Web IDE.

In this course we will implement the following layers:

  • Acquisition Layer - we are implementing a staging area for our persistent integration. We will not be implementing virtual data or replicated data so we will not need these areas of the data acquisition layer. In our project we will call this layer STAGE
  • Raw Layer - we will load this layer with transaction data that does not need transforming. We will call this layer RAW
  • Business Integrated Layer - we will load this layer with master data that is organized using the data vault model. We will call this layer BID
  • Virtual Analytical Layer - this layer is where we define our basic calculation views that will feed the data mart layer. We will call this layer VAL
  • Data Mart Layer - this layer is where we define our star schema that will be consumed by analytical tools. We will call this layer DM

We will not be implementing the following layers, which for many customers, would be regarded as less important:

  • Corporate Memory
  • Agile Area

Project Repository Overview

The database artifacts we will create will be stored in a SAP HANA Database (HDB) module. A SAP HANA database module is a collection of related design-time database artifacts, such as data models, views, procedures and of course, tables. When you first build the HDB module, a run-time container is automatically generated and the design-time objects of the database module become run-time objects of the respective HDI container. When you create a HDB module in Web IDE, the folders cfg and src are created automatically for you. The cfg folder is used to store configuration files for the module, for example the services.hdbgrants file which is used to manage roles and privileges for the module.

The src folder is used to develop the actual database artifacts. In our example, the src folder contains individual sub-folders that represent the layers of the reference architecture - namely the Stage area, the Raw and Business Integrated Data Warehouses, the Virtual Analytical Layer, and the Data Mart layer. These folders are used to organize the source files so developers have a clear view of the data warehouse architecture. During a build, all source files across all folders will appear in the same container. One of the added benefits of using folders is that you can provide a namespace file within each folder so that during a build, the run-time object name automatically includes the name you have provided in the namespace file. This makes it possible to identify the source folder for each run-time object. In our case, this means we could easily identify the layer of the data warehouse for each object in the database.

Note

By default, the namespace files are hidden in Web IDE so you need to enable hidden files so they appear in the folder structure. This is done using a toolbar button in Web IDE in the project area.

Staging Area Folder

Our Staging folder is a member of the Acquisition Layer and is used for persistent integration. The data is loaded to this layer but is usually temporary and will be deleted later once it has moved to the next layer. Think of this layer as data on its journey to a permanent store. No enhancements are made to the data in this layer except perhaps you might want to add a time-stamp to each record to log its loading date/time. This might be useful later if you need to consider delta handling.

In this layer we find data pipelines which we call flowgraphs that move the data from the original source tables to the staging tables of this layer. In this course we have already loaded data from the source system to SAP HANA. The original source tables are managed in an external database schema. We won't cover how our source data was loaded to SAP HANA, but you can assume an external ETL tool moved the data to SAP HANA without making any changes.

The Stage folder contains three sub-folders, etl,synonyms and tables. To define the data pipelines that move the data from the source tables in the external schema to the STAGE layer, we will create SAP HANA Flowgraphs. Flowgraphs are stored in the etl folder.

For our data warehouse, we will need synonyms too, and these are stored in the synonym folder. Synonyms are needed when we need to read from tables or views that are located in SAP HANA database but are stored in an external database schema. This is the case for our source tables which reside in an external schema in the SAP HANA database. So you will see in our example how synonyms are used as source objects in all flowgraphs of this layer.

If we were implementing a Data_InHub (virtual) scenario, we would need to define virtual tables within our project. Virtual tables connect directly with the tables in remote systems. They are easy to setup using the Web IDE. You simply define a source file that points to a table in a remote system. Then, the meta data of the remote table is read by SAP HANA so that virtual tables can be automatically generated to be an exact copy of the structure of the source table. Our flowgraphs would then use the virtual tables instead of synonyms for the data source.

The final folder is tables and this is where we store the definitions of the target tables for this layer. We will use .hdbcds source files to define our tables.

We will go into more details about flowgraphs, synonyms and tables, later in this course.

RAW Data Warehouse Folder

Similar to the STAGE folder, the RAW folder also contains anetl sub-folder for flowgraphs. We don't use synonyms in this layer so we don't need a synonyms folder. We find a tables sub-folder which contains the source code that defines the target tables for this layer. We will define our tables using CDS source files that have the extension .hdbcds. These files are discussed in more detail later.

Business Integrated Data Warehouse Folder

As with the RAW folder, the BID folder contains an etl sub-folder for flowgraphs and a tables sub-folder for the target tables of this layer. Additionally, here we use a sequences sub-folder. .hdbsequence files are used in the historization process of incoming data to generate surrogate keys. Further details about the purpose and functionality of sequences are discussed later.

Virtual Analytical Layer Folder

The VAL folder contains a views sub-folder, which stores the calculation views that are necessary to implement the Virtual Analytical Layer.

Data Mart Folder

The DM folder contains a views sub-folder, which stores the calculation views that are necessary to implement the Data Mart.

Log in to track your progress & complete quizzes