Bringing changes to the target table

Objective

After completing this lesson, you will be able to Update data which changes slowly over time.

Changed Data Capture (CDC)

When you have a large amount of data to update regularly and a small amount of system down time for scheduled maintenance, you must choose the best method for a delta load or for updating your data over time.

Full Refresh and Changed Data Capture

You can choose to do a full refresh of your data or you can extract only new or modified data to update the target system:

  • Full Refresh:

    Full refresh is easy to implement and easy to manage. This method ensures that no data is overlooked or left out due to technical or programming errors. Use full refresh to perform a delta load to a target system in an environment with a manageable amount of source data.

  • Changed Data Capture:

    When an initial load is complete, you can extract only new or modified data and update the target system. Identifying and loading only changed data is called Changed Data Capture (CDC). CDC is recommended for large tables.

Changed Data Capture Benefits

  • Improves performance because the job takes less time to process with less data to extract, transform, and load.
  • The target system is able to track change history so that data can be correctly analyzed over time.

Setting up a full CDC solution within Data Services may not be required. Many databases now have CDC support built into them, such as Oracle, SQL Server, DB2 and SAP Sybase. Refer to the Designer Guide for more information: https://help.sap.com/docs/SAP_DATA_SERVICES/ec06fadc50b64b6184f835e4f0e1f52f/572111656d6d1014b3fc9283b0e91070.html?locale=en-US&q=preload%20sql

If you do want to set up a full CDC solution, you can choose source–based and/or target–based CDC.

CDC Solutions

  • Source–based CDC evaluates the source tables to determine what has changed and only extracts changed rows to load into the target tables.
  • Source–based CDC is preferable to target–based CDC for performance reasons.
  • Some source systems do not provide enough information to make use of the source–based CDC techniques.
  • Target–based CDC extracts all the data from the source, compares the source and target rows using a Table Comparison transform, and then loads only the changed rows into the target.
  • You can use a combination of source-based and target-based techniques.

History Preserving

When updating data, you need to decide if you want to track your changes or not.

Data Changes Management

There are three ways to manage data changes:

  • No history preserving
  • Limited history preserving
  • Unlimited history preserving

No History Preserving

If history preserving is not necessary, you can just update the changed row.

The following example updates a sales person name, without tracking changes.

Target Table Before Data Change

The table shows the data before the change:

SALES_PERSON_IDNAMESALES_TEAM
000120Doe, John BNorthwest

Target Table After Data Change Without History

The table shows the data when the salesperson's name has been changed:

SALES_PERSON_IDNAMESALES_TEAM
000120Smith, John BNorthwest

Limited History Preserving

If you need to track changes, but without storing all the historical values, you must add two additional fields per updatable column, one to record the new value and one to record the date of the change.

  • You can preserve only one change per attribute, such as old and new or first and last.
  • Each change requires a minimum of one additional field per attribute and another additional field if you want to record the date of the change.
  • Although the table’s structure contains all the data needed, the SQL code required to extract specific information can be complex.

Limited History Preserving can store a change in data, but cannot accommodate multiple changes or adequately serve the need for summary reporting.

The following example shows the implementation of a limited history preserving for the sales team value.

Target Table Before Data Change

The table shows the data in the target table before the change:

SALES_PERSON_IDNAMESALES_TEAMOLD_TEAMEFF_TO_DATE
000120Smith, John B.NorthwestNULLNULL

Target Table After Data Change With Limited History Preserving

The table shows that the salesperson’s sales team has been changed:

SALES_PERSON_IDNAMESALES_TEAMOLD_TEAMEFF_TO_DATE
000120Smith, John B.SoutheastNorthwestOct_31_2004

Unlimited History Preserving

Unlimited History Preserving resolves most of the issues related to data changes management:

  • Generates new rows for significant changes.
  • Requires the use of a unique key.
  • Optionally adds an Effective_Date field or two Valid_From and Valid_To fields.
  • Optionally adds an IsActive field.

    You can then filter by IsActive = Y to view current rows versus expired rows.

To preserve history, you need to manage the constraint issues in your target tables that arise when you have more than one record in your tables for a single entity, such as a customer or an employee.

For example, with your sales records, the Sales Person ID is the primary key and is used to link that record to all of the representative’s sales orders. If you try to add a new record with the same primary key, it causes an exception. On the other hand, if you assign a new Sales Person ID to the new record for that representative, you compromise your ability to report accurately on the representative’s total sales.

To address this issue, create a surrogate key as a new column in the target table. The surrogate key becomes the new primary key for the records. At the same time, you change the properties of the former primary key so that it is simply a data column.

When a new record is inserted for the same representative, a unique surrogate key is assigned, allowing you to continue to use the Sales Person ID to maintain the link to the representative’s orders.

Target Table Before Data Change

The table shows the data before the change:

SALES_PERSON_KEYSALES_PERSON_IDNAMESALES_TEAM
15000120Doe, John BNorthwest

Target Table After Data Change

When you implement unlimited history preserving, two records appear as shown in this table:

SALES_PERSON_KEYSALES_PERSON_IDNAMESALES_TEAM
15000120Doe, John BNorthwest
133000120Doe, John BSoutheast

Updates to target tables

SAP Data Services flags each row internally in a data set with an operation code that identifies the status of the row.

The following table describes how each operation code can be obtained and how Data Services processes rows with the operation code.

Describing Operation Codes

The operation codes indicate how each row in the data set is applied to the target table. The operation codes are:

Operation CodeContextResult
NORMALRow extracted from a source.Creates a new row in the target.
INSERTRow in a data set has been added when compared to an earlier image of the same data set.Creates a new row in the target.
UPDATERow in a data set has changed when compared to an earlier image of the same data set.Overwrites an existing row in the target.
DELETERow in a data set has been removed when compared to an earlier image of the same data set.Deletes the row from the target.

Most transforms operate only on rows flagged as NORMAL. So, for example, a Query transform will always output rows with the NORMAL operation code.

Note

This operation code generates an insert statement in the target table. But if the auto correct load option is set on the target, the actual action can become an update.

The Table_Comparison transform is used to compare two data sets and can generate INSERT, UPDATE or DELETE operation codes.

The History_Preserving transform can then be used to change updates into inserts and deletes into updates, for example.

Another transform can be used to change operation codes: The Map_Operation transform.

Introduction to the Map Operation Transform

The Map_Operation transform allows for the changing of operation codes on data sets to produce the desired output. For example, if a row in the input data set has been updated in some previous operation of the data flow, use this transform to map the UPDATE operation to an INSERT . The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target.

The Map Operation Transform

As shown in the figure, the Map Operation Transform:

  • Explicitly overrides the operation codes
  • Can discard rows with specific operation codes
  • Is used for subsequent transform compatibility and to control writing to the target.

Input for the Map Operation transform is a data set with rows flagged with any operation codes.

The Map Operation transform enables the setting of the output row type option to indicate the new operations desired for the input data set. Choose from the following operation codes: INSERT, UPDATE, DELETE, NORMAL, or DISCARD.

Here is an example:

Optionally, you can write mapping expressions for each column, per row type.

Mapping Expressions

When you write mapping expressions per column and per row type (INSERT/UPDATE/DELETE) you can:

  • Change the value of data for a column.
  • Execute different expressions on a column, based on its input row type.
  • Use the before_image function to access the before image value of an UPDATE row.

You could, for example, load data from a sales delivery source, change inserts into updates and change the value of the sales orders STATUS column to 'DELIVERED'.

Log in to track your progress & complete quizzes