Creating a Simple Data Flow

We've already learned that the data replication approach copies the source data to the target database without changing the data. But in many cases, you need to make changes to the data during the data provisioning process.

Why would we change data during provisioning? Consider the following requirements:

Harmonize multiple, disparate data sources into a consistent format. For example, you need to align column lengths or data types or even column values.
Convert values. For example, you need to swap currencies or switch products from old code to new codes.
Calculate new values. For example, calculate profit or determine expiry dates.
Add missing fields. For example, add simply a constant value, add the current date, or add a missing country code that can be found using a look-up based on the city field.
Validate and reject records. For example, incomplete orders.

In these cases, we need to look at the data transformation approach.

You implement data transformation in SAP HANA using a graphical object called a flowgraph. A flowgraph defines the steps of the data journey from source to target and includes all the transformation steps along the way.

The image shows a flowgraph with four sources and one target and three intermediate nodes: Projection, Join, Table Comparison

It's possible to read from different data sources and to write to different data targets within the same flowgraph. The sources can be virtual tables, local tables or views. You can combine data and split it up for distribution.

Flowgraph Components

Typically, a flowgraph consists of three different types of objects.

At least one source
One or more intermediate nodes that are connected in sequence
At least one target to store the resulting data set

You create a flowgraph by creating a design time file in your project, with the extension .hdbflowgraph. To simplify the creation and maintenance, Web IDE and Business Application Studio provide a graphical editor. You simply select node types and drag them to the canvas. You then configure each node with specific settings to define what should happen when the data travels through the node.

The configuration of the Data Source node specifies which object is used as a source. Typically, it is an existing table or a view.

Between data source and data target, you implement other nodes to define the data transformation. A common transformation is the Projection node. It can be used for the following purposes:

Restrict the records (rows) based on a filter expression
Remove fields (columns)
Rename fields (columns)
Add new fields (columns) using SQL expressions

Storing the Result

For each target table, you need to provide a Data Target node to store the results of the transformation.

Note

There are other data target options besides tables. You will learn about them in the next lesson.

Creating and Calling a Flowgraph

the image displays a simple flowgraph with 3 nodes. On the left, you see a source with 2 columns and two rows. It is connected to a projection node that adds an additional third column, based on expression CONCAT(column1, column2). The projection is connected to a data target node that contains a mapping of columns. The target has initially one row. Then, two actions are listed. 1. Deploy flowgraph: This step checks transformatin definition and creatas a task or procedure. 2. Execute flowgraph. this step starts the task or procedure, processes a transformation and fills a target table. At the ent, the target table, will contain 3 rows.

Before you can apply the transformation, you need to deploy the flowgraph. During deployment, the definitions are checked and executable run-time objects are generated in the container (database schema) of your module. After successful deployment, you can execute the flowgraph manually from the graphical editor, or you can schedule it.

Launch the following video to learn how to create a simple flowgraph.

Let's have a closer look at the flowgraph settings.

Launch the following video to learn how you define settings and how your choices determine which run-time objects are generated.

The following table describes the flowgraph settings:

Options for Flowgraph Settings

Setting	Purpose	Created run-time objects	Comment
Batch task	Process data as a batch or initial load	A procedure A task for batch load	All node types are valid.
Real-time task	Process data in real time	A procedure A task for batch load A task for processing updates in the sources in real-time	Some node types aren't valid.
Transactional Task	Process data in real time without initial load	A procedure A task for processing updates in the sources in real-time	Some node types aren't valid.
Procedure	Schedule or integrate the transformation in another procedure or flowgraph	Only a procedure	Data Provisioning nodes aren't valid.

In the following lessons, you’ll learn more details about possible sources and targets and about other transformation options and debug capabilities.

References

To go deeper in this topic, you might like to look at these sources from SAP Help Portal: