Defining Data Sources and Data Targets

Objective

After completing this lesson, you will be able to configure a data source and data target in a flowgraph.

Data Sources

Let's have a closer look at sources and targets.

Each Data Source node connects the flowgraph with a source of data, usually an existing SAP HANA object. It does not need to be a physical table, but can be any object of your HDI container that provides a data set, even a table type. (A table type is a definition of a table structure, and it needs to be instantiated at runtime.)

The image shows a screen capture of a data source node and data source options: Virtual table, table, view, calculation view, synonym, or table type.

A flowgraph doesn't need to begin with a Data Source node. It is possible to generate new data using two special nodes:

  • The Row Generator creates one column that contains row IDs.
  • The Date Generator creates one column that contains generated date values.

In both cases, you can specify a start value and an end value. The row generator generates a sequence of integers.

For the date generator node, the following increment step options are available:

  • DAILY
  • WEEKLY
  • MONTHLY
The image shows row generator and date generator nodes: With row generator node, you would generate a column with rows 1,2, 3, etc. With Data Generator, and increment setting WEEKY, you would generate a column with 2024-01-07, 2024-01-14, 2024-01-21, etc.

For example, imagine you want to generate a column that contains the dates for all Sundays of 2024, you would define a start with January 7th, 2024 (the first Sunday), define an end with December 31st, 2024, and define a date increment of WEEKLY.

Data Targets

A data target node sits at the end of the flowgraph and is used to define the destination of the data flow.

Hint

What can you do if you don't yet have an existing table with suitable data types?

You can define the target as a template table. If you choose a template table, a new table is automatically proposed based on the output structure of the predecessor node.

The image shows how a target table is used: When you 1st deploy a flowgraph, the transformation definition is checked. 2nd When you execute a flowgraph, the procedure is started and the transformation is processed and the target table is filled.

You can remove proposed columns from the template table, but it's better practice to remove the columns earlier in the flow to improve runtime performance. The table is created during deployment of the flowgraph. Like any other table, it's filled when the flowgraph is executed. Template tables are very useful during the design phase. When you add or remove columns from the predecessor node, you don't have to repeat these steps for the output.

The image shows data target node and options: table, synonym, or table type.

When you load to a data target that already contains data, you need to specify how the new and existing records are handled. This also applies to empty tables or template tables after the first execution.

The following options exist:

  • Truncate: Delete all existing records and fill the records into the empty table.
  • Insert: Add new rows in addition to existing ones. For this option, define a sequence as key generator that finds the next unused integer as a row number.
  • Update: Overwrite existing records with additional or more current information. This option requires that you define a sequence and use a table with a primary key or define the key fields of the template target.
  • Upsert: Insert the new or update the changed records. This option requires that you define a sequence and you use a table with a primary key or define the key fields of the template target.