Develop a Custom Data Warehouse on SAP HANA

Data Provisioning and Transformation

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Explain how to create and edit synonyms in SAP Web IDE for data provisioning

Data Provisioning and Transformation

Staging Area: Data Provisioning via Synonyms and Virtual Tables

Following the reference architecture, data provisioning takes place via synonyms in the staging folder. Synonyms are aliases for database objects, but are not real database objects by themselves. Synonyms can be created for tables (this includes virtual tables), views, procedures, table functions, scalar functions and sequences. In our example, we created the synonyms in a shared file called erp.hdbsynonym which is located in the folder HDW410/BIonHana/src/STAGE/synonyms. To edit the synonyms, you can either use the Code Editor, for plain code-based programming, or the graphical Synonym Editor which enables GUI-oriented programming that does not require knowledge about JSON syntax, which is applied in the .hdbsynonym file.

Available Nodes for the Flowgraph UI

The Flowgraph UI is a Smart Data Integration (SDI) tool that enables developers to create powerful data loading and transformation flows with the convenience of a graphical user interface. Flowgraphs follow a modular construction principle - they consist of nodes that are connected with each other through inputs and outputs. Every node represents a function that executes a certain task (for example, load a table, add a column, or create a new table) and that can be placed, configured, and connected anywhere within the flowgraph. To construct a flowgraph in SAP Web IDE you simply put the nodes you need on the canvas, adjust the configuration (set filters, set join attributes, add new attributes, and so on), and connect them. When you're finished you choose Execute and the system performs the transformation for you. It is also possible to change the runtime behavior of flowgraphs (for example, real-time tasks, procedures) so you can schedule the execution.

For the reference architecture flowgraphs, we use the following node types:

  • Data Source: Represents the source of the data that is being transformed within the flowgraph.
  • Data Target: Represents the target object, the resulting objects of the data transformation processes.
  • Projection: Represents a relational selection (filter) combined with a projection operation. It also allows calculated attributes to be added to the output.
  • Join: Represents a relational multi-way join operation.
  • Lookup: Retrieves a column value or values from a lookup table that match a lookup condition you define.
  • Table Comparison: Compares two tables and produces the difference between them as a dataset.

Flowgraph Nodes

Flowgraph Examples

The following figures show a range of example flowgraphs for that are used to transform and harmonize data in the reference architecture introduced in this course.

In the Stage area, flowgraphs are used to load the source tables via synonyms, add a load date timestamp and a source, and then transfer the data into new staged tables.

Flowgraphs in the RAW Data Warehouse also load the source tables via synonyms. Loaded attributes, however, are not transformed. They remain unchanged and the source primary keys also become primary keys in the target tables. To integrate the data into the model, relationships to other tables are defined. Specifically, the tables CE1IDEA and CE2IDEA are put in relation with the CUSTOMER, PRODUCT, and STORE hubs. Additionally, new columns with the load date timestamp and the source are added to the tables.

Flowgraphs for the hub tables in the Business Integrated Data Warehouse load data from the staged source tables. Afterward, they transform the attributes according to the model requirements, check whether the assigned business key already exists, and then insert the data into the hub table. The surrogate keys are created in the Data Target node of each flowgraph, using a predefined sequence database object.

Sequence objects define rules for the incrementation of key attributes. They are stored as dedicated database objects and can be selected in flowgraph Data Target nodes.

Flowgraphs for the satellite tables in the Business Integrated Data Warehouse load data from both the staged source tables and the hub tables. Note that due to this dependency, hub table flowgraphs in the reference architecture always need to be executed before the satellite table flowgraphs. Once loaded, source and hub tables are joined to create a new table that includes both the business key and the surrogate key (as you may recall from the hub table flowgraph example, surrogate keys are created in the data targets of the hub table flowgraphs). Afterward, a table comparison is performed to check if the rows to be added already exist, before the data is added to the target table.

Like satellite and hub table flowgraphs, link table flowgraphs also load the data from the staged source tables. Then a lookup is performed to check if the data to be added already exists. Afterward, the loaded data is joined with the already loaded hub tables. Note that due to this dependency, hub table flowgraphs in the reference architecture always need to be executed before the link table flowgraphs. Once the table is joined, data is delivered to the link table.

Implement an SAP SQL Data Warehouse in SAP HANA Web IDE

Save progress to your learning plan by logging in or creating an account