Working with Flowgraphs

Objective

After completing this lesson, you will be able to provision data to the data warehouse

Data Provisioning and Transformation

As we described earlier, all of our source tables are provided in an external schema within the SAP HANA database. . We will now load these tables to the next STAGE. To achieve this we will create flowgraphs in Web IDE.

Flowgraph Editor

The Flowgraph Editor is a data integration tool that is accessed via Web IDE. The tool enables developers to create powerful data loading and transformation flows with the convenience of a graphical user interface. The extension for a flowgraph source file is .hdbflowgraph.

Flowgraphs follow a modular construction principle - they consist of nodes that are connected with each other through inputs and outputs. Every node represents a function that executes a certain task, for example, load a table, add a column, or create a new table. Nodes can be placed, configured, and connected anywhere within the flowgraph.

To construct a flowgraph in SAP Web IDE you simply put the nodes you need on the canvas, adjust the configuration by setting filters, join attributes, adding new attributes, and so on, and connect them. When you're finished you choose the Execute button in the Editor to test the data flow. In an productive environment you would change the runtime behavior of flowgraphs to real-time tasks for replication scenarios, or to procedures so you can schedule the execution in batch mode.

For our reference architecture flowgraphs we will use the following node types:

  • Data Source: Represents the source of the data that is being transformed within the flowgraph. These are our synonyms that point the tables in the provided external schema.
  • Data Target: Represents the target object, the resulting objects of the data transformation processes. We provide the table definitions in CDS source files within our project.
  • Projection: Represents a relational selection (filter) combined with a projection operation. It also allows calculated attributes to be added to the output.
  • Join: Represents a relational multi-way join operation.
  • Lookup: Retrieves a column value or values from a lookup table that match a lookup condition you define.
  • Table Comparison: Compares two tables and produces the difference between them as a dataset.

Flowgraph Nodes: Data Source

Let's take a closer look at the data source node that begins the data flow.

In the Data Source node you set the source data that is will be transformed within the flowgraph. You can either choose the Table Type option or the HANA Object option. If you choose the Table Type option you can add columns yourself or import columns from existing tables without setting a reference to the table itself. Choosing the HANA Object option enables you to create a reference to an existing SAP HANA database object. You can decide if you want to partition the data. Data partitioning is used to separate large data sets into smaller sets based on a set of defined criteria. These partitions can be run serially or in parallel. Partitioning data can be helpful when you are initially loading a large data set, because it can improve performance.

Flowgraph Nodes: Data Target

Let's take a closer look at the data target node that appears at the end of the data flow.

Data Target nodes work similarly to Data Source nodes. You can choose from the Table Type and HANA Object options to set the destination for the flowgraph output. Additionally, in the Data Target, there is a Template Table option that lets you create a new HANA database table on the basis of the flowgraph output. The Data Target settings can be used to change the data creation behavior. For example, you may want to truncate the data target every time the flowgraph is execute, or you may want to specify the writer type (choose from Insert, Upsert, and Update). Or you may want to set a key generation attribute to generate a new key for the target data, starting from a value based on existing keys in the column you specify or based on a external sequence. The latter option is an important feature for the reference architecture. And last but not least, in the Data Target node you can specify a logical key for the output object.

Flowgraph Nodes: Projection

Almost all flowgraphs will include one or more projection nodes.

Projection allows you to add or delete attributes from a table, change the attribute mapping names, and perform SQL statement mappings for an attribute. Also, using the Filters menu, you can perform relational selections and, thus, decide on the selection of data that is to be used in the flowgraph.

Flowgraph Nodes: Join

Using the Join node you can perform multiple step joins on two or more inputs. In the Columns view of the node you can select which of the input columns should also be the resulting output columns. In the Criteria view you can define the actual join statement with a left and right join partner, the type of join (inner join, left outer, right outer, full outer, cross-join), and the join condition. Also, in the Filter view you can define specific filter statements to only process a selection of data.

Flowgraph Nodes: Lookup

Using the Lookup node you can retrieve a column value or values from a lookup table that match a lookup condition you define. This way you can check, for example, if a business key value already exists in a target table, and return the result of the lookup in a new column. Afterwards, this new column can be used to filter data rows out in a projection node.

Flowgraph Nodes: Table Comparison

The table comparison node compares two datasets and produces the difference between them as a data set with rows flagged as INSERT, UPDATE, or DELETE. The operation generates an Op_Code to identify records to be inserted, deleted, or updated to synchronize the comparison table with the input table. Similarly to the lookup node, this can be used to check if, for example, a business key value already exists in a target table. You can decide if you want to compare all fields in a record or only selected fields. You can also define a filter condition to ensure you are only comparing the input to records in the comparison table that fit a certain criteria. For example, compare only to the current version of a record and not the historic versions. The Generated Key Attribute setting allows you to handle a case where multiple comparison records might be found. In that case the key attribute setting is used to point to a column in the comparison table where usually a surrogate key is found so the highest number can be selected. This selection should represent the most up-to-date record in the set. The table comparison is a very powerful node and is very useful to handle delta scenarios, especially when the source system cannot deliver a column that indicates whether the incoming record is new, changed or deleted.

Debugging Flowgraphs

During the development of a flowgraph, it is often helpful to view the data as it flows through the interim nodes so that you can check if the results are as expected. If your data source node or data target node is assigned to a database table, it is possible to view the table data even before saving and building the flowgraph by clicking the data preview button in those nodes. However, it is not possible the view the results of the interim nodes until you perform some simple setup steps which are shown below in the diagram.

The basic steps to setup debugging of interim nodes are:

  1. Enable debug mode so you then see the debug on/off selectors
  2. Select the interim node that you would like to debug
  3. Choose the setting Data Preview : On so you then see the data preview icon appear in the selected node. You can repeat steps 2 and 3 for each node you would like to enable for debugging.
  4. After saving and rebuilding the flowgraph, click the data preview button in the node you would like to debug to view the results of the output of the node. You do not execute the entire flowgraph to see the interim results.

Variables

You can pass variable values into flowgraphs to increase their flexibility. This is useful when you don't want to hard-code values into expressions within nodes, such as filters.

Variables are defined in the general properties of the flowgraph.

The values for the variable can be provided at run-time as fixed values, but they can also be generated at run-time using SQL. For example, you could count the number of orders in a month from a sales table and use this count in a filter expression of a projection node.

Log in to track your progress & complete quizzes