When you have a large amount of data to update regularly and a small amount of system down time for scheduled maintenance, you must choose the best method for a delta load or for updating your data over time.
Full Refresh and Changed Data Capture
You can choose to do a full refresh of your data or you can extract only new or modified data to update the target system:
- Full Refresh:
Full refresh is easy to implement and easy to manage. This method ensures that no data is overlooked or left out due to technical or programming errors. Use full refresh to perform a delta load to a target system in an environment with a manageable amount of source data.
- Changed Data Capture:
When an initial load is complete, you can extract only new or modified data and update the target system. Identifying and loading only changed data is called Changed Data Capture (CDC). CDC is recommended for large tables.
Changed Data Capture Benefits
- Improves performance because the job takes less time to process with less data to extract, transform, and load.
- The target system is able to track change history so that data can be correctly analyzed over time.
Setting up a full CDC solution within Data Services may not be required. Many databases now have CDC support built into them, such as Oracle, SQL Server, DB2 and SAP Sybase. Refer to the Designer Guide for more information: https://help.sap.com/docs/SAP_DATA_SERVICES/ec06fadc50b64b6184f835e4f0e1f52f/572111656d6d1014b3fc9283b0e91070.html?locale=en-US&q=preload%20sql
If you do want to set up a full CDC solution, you can choose source–based and/or target–based CDC.
CDC Solutions
- Source–based CDC evaluates the source tables to determine what has changed and only extracts changed rows to load into the target tables.
- Source–based CDC is preferable to target–based CDC for performance reasons.
- Some source systems do not provide enough information to make use of the source–based CDC techniques.
- Target–based CDC extracts all the data from the source, compares the source and target rows using a Table Comparison transform, and then loads only the changed rows into the target.
- You can use a combination of source-based and target-based techniques.