Describing SAP Data Services Integrator transforms

Objectives

After completing this lesson, you will be able to:

  • Describe the data services integrator transforms

Data Services Integrator Transforms

Data Services Integrator transforms are used to enhance your data integration projects beyond the core functionality of Platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the Platform transforms do not provide enough flexibility and so you turn to the Data Services Integrator-specific transforms for assistance.

Data Services Integrator transforms perform key operations on data sets to manipulate their structure as they are passed from source to target.

Besides Table_Comparison, History_Preserving, and Key_Generation, you can find the following transforms:

More Data Services Integrator Transforms

These transforms are available in the Data Integrator branch of the Transforms tab in the Local Object Library:

TransformDescription
Data TransferAllows a data flow to split its processing into two subdata flows and push down resource-consuming operations to the database server.
Date GenerationGenerates a column filled with date values based on the start and end dates and increment you specify.
Effective DateGenerates an additional effective to column based on the primary key’s effective date.
Hierarchy FlatteningFlattens hierarchical data into relational tables so that it can participate in a star schema. Hierarchy flattening can be both vertical and horizontal.
Map CDC OperationSorts input data, maps output data, and resolves before and after versions for UPDATE rows. While commonly used to support Oracle or mainframe changed data capture, this transform supports any data stream if its input requirements are met.
PivotRotates the values in specified columns to rows.
Reverse PivotRotates the values in specified rows to columns.
XML PipelineProcesses large XML inputs in small batches.

Let's describe the use cases for some of them.

The Date Generation Transform

Use the Date_Generation transform to produce the key values for a time dimension target as shown in the figure. From this generated sequence, populate other fields in the time dimension (such as day_of_week) using functions in a query.

Example: To create a time dimension target with dates from the beginning of the year 1997 to the end of the year 2000, place a Date_Generation transform, a query, and a target in a data flow. Connect the output of the Date_Generation transform to the query, and the output of the query to the target.

Date_Generation Transform Options

Inside the Date_Generation transform, specify the following Options:

  • Start date: 1997.01.01 (A variable can also be used.)
  • End date: 2000.12.31 (A variable can also be used.)
  • Increment: Daily (A variable can also be used.)

Computing Time Dimension Values

Inside the query, you can create new target columns, and define a mapping for different time dimension values. Here are some examples of functions you can use:

  • julian function : To create a unique date id.

    It's the number of days between the start of the Julian calendar and the given date.

  • month function: To extract the month number (1 to 12) from the given date.
  • quarter function: To extract the quarter number (1 to 4) from the given date.
  • to_char function: To extract a part of the given date in string format.
  • day_in_week function: To extract the number of the day (1 to7) in the week from the given date.
  • isweekend function: To output a boolean (1 or 0) to know if the given date is on a weekend or not.

Let's Try It

Let me guide you through the creation of a time dimension:

The Effective Date Transform

With the Effective_Date transform, as shown in the figure, an effective-to value is calculated for data that contains an effective date. The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.

The Map CDC Operation Transform

While commonly used to support relational or mainframe changed-data capture (CDC), the Map_CDC_Operation transform supports any data stream as long as its input requirements are met. Relational CDC sources include Oracle and SQL Server. This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE, and DELETE operation codes. Data Services produces a warning when other objects are used.

As shown in the figure, the Map_CDC_Operation transform is using input requirements (values for the Sequencing column and a Row operation column), to perform three functions

  1. Sort input data based on values in Sequencing column box and (optionally) the Additional Grouping column box.
  2. Map output data based on values in Row Operation column box. Source table rows are mapped to INSERT, UPDATE, or DELETE operations before passing them on to the target.
  3. Resolve missing, separated, or multiple before-and after-images for UPDATE rows.

We will look deeper into the Pivot and Data_Transfer transforms in the next lessons.

Log in to track your progress & complete quizzes