Defining Data Sources and Data Targets

Objective

After completing this lesson, you will be able to Configure a data source and data target in a flowgraph.

Data Sources

Let's have a closer look at sources and targets.

Each Data Source node connects the flowgraph with a source of data, usually an existing SAP HANA object. It does not need to be a physical table, but can be any object of your HDI container that provides a data set, even a table type. (A table type is a definition of a table structure, and it needs to be instantiated at runtime.)

A flowgraph doesn't need to begin with a Data Source node. It is possible to generate new data using two special nodes:

The Row Generator creates one column that contains row IDs.
The Date Generator creates one column that contains generated date values.

In both cases, you can specify a start value and an end value. The row generator generates a sequence of integers.

For the date generator node, the following increment step options are available:

DAILY
WEEKLY
MONTHLY

For example, imagine you want to generate a column that contains the dates for all Sundays of 2024, you would define a start with January 7th, 2024 (the first Sunday), define an end with December 31st, 2024, and define a date increment of WEEKLY.

Data Targets

A data target node sits at the end of the flowgraph and is used to define the destination of the data flow.

Hint

What can you do if you don't yet have an existing table with suitable data types?

You can define the target as a template table. If you choose a template table, a new table is automatically proposed based on the output structure of the predecessor node.

You can remove proposed columns from the template table, but it's better practice to remove the columns earlier in the flow to improve runtime performance. The table is created during deployment of the flowgraph. Like any other table, it's filled when the flowgraph is executed. Template tables are very useful during the design phase. When you add or remove columns from the predecessor node, you don't have to repeat these steps for the output.

When you load to a data target that already contains data, you need to specify how the new and existing records are handled. This also applies to empty tables or template tables after the first execution.

The following options exist:

Truncate: Delete all existing records and fill the records into the empty table.
Insert: Add new rows in addition to existing ones. For this option, define a sequence as key generator that finds the next unused integer as a row number.
Update: Overwrite existing records with additional or more current information. This option requires that you define a sequence and use a table with a primary key or define the key fields of the template target.
Upsert: Insert the new or update the changed records. This option requires that you define a sequence and you use a table with a primary key or define the key fields of the template target.

Next lesson