Setting options to improve performance


After completing this lesson, you will be able to:

  • Set options to improve performance

Data Caching

You can improve the performance of data transformations that occur in memory by caching as much data as possible. By caching data, you limit the number of times the system must access the database. Cached data must fit into available memory.

The following table contains operations that benefit from caching data in memory during data flow processing.

Operations that can Benefit from Data Caching

JoinsBecause Data Services must read an inner source of a join for each row of an outer source, cache a source when it's used as an inner source in a join.
Table comparisonsBecause Data Services must read a comparison table for each row of a source, cache the comparison table.
LookupsWhen a lookup table exists in a remote database, cache the table in memory to reduce access times.

Caching Joins

The join operation in a Query transform uses the cache settings from the source, unless you change the setting in the Query editor.

In the Query editor, the cache setting is set to Automatic by default. The Automatic setting carries forward the cache settings from the source table.

The following table shows the relationship between cache settings in the source and cache settings in the Query editor, and the effective cache setting for the join.

Cache settings

Cache Setting in SourceCache Setting in Query EditorEffective Cache Setting
For more information on join caching, see the Performance Optimization Guide:

Data Cache Types

You can use two different types of caching for the operations in a data flow.

  • In-memory: Use when your data flow processes a small amount of data that fits in memory.
  • Pageable cache: Use when your data flow processes a large amount of data that doesn't fit in memory.

Administrators select a pageable cache location to save content over the 2 GB RAM limit. The pageable cache location is set up in Server Manager and the option to use pageable cache is selected on the Data flow Properties dialog box.

Create persistent cache datastores by selecting Persistent Cache as the database type in the Create New Datastore dialog box. The newly–created persistent cache datastore shows in the list of datastores, and can be used as a source in jobs.

Bulk Loading

Bulk loading is used when you need to load large amounts of data with good performance. When using bulk loading, you don’t just insert data one row at a time but use more efficient methods based on the structure of the specific database.

SAP Data Services supports bulk loading in most supported databases that enable you to load and in some cases read data in bulk rather than using SQL statements.

Bulk Loading Considerations

The following list contains some general considerations when you use bulk loading and reading:

  • Specify bulk-loading options on the Data Services target table editor in the Options and Bulk Loader Options tabs.
  • Most databases don’t support bulk loading with a template table.
  • The operation codes that you can use with bulk loading differ between databases. Most of them support NORMAL and INSERT operation codes but only some of them support UPDATE and DELETE.

Join Ranking

You can enhance your Query transform join operations performance by assigning a join rank to each join in your setup.

When you rank each join,

  • SAP Data Services considers the rank relative to other tables and files joined in the data flow.
  • The optimizer, which is the optimization application inside the Data Services engine, joins sources with higher rank values before joining sources with lower rank values.
  • The order of execution depends on join rank and, for left outer joins, the order defined in the FROM clause.

Combining Joins

The optimizer bases the way it joins your data in the following ways:

  • The optimizer can combine the joins from consecutive Query transforms into a single Query transform, reassigning join ranks.
  • The optimizer can consider the upstream join rank settings when it makes joins.

Process Slicing

You can also optimize your jobs with process slicing by splitting data flows into sub data flows.

Sub Data Flows

  • Work on smaller data sets and/or fewer transforms so each process consumes less virtual memory.
  • Leverage more physical memory per data flow as each sub data flow can access 2 GB of memory.

Process slicing is available in the Advanced tab of the Query transform. You can run each memory–intensive operation as a separate process as shown in the following figure:

Log in to track your progress & complete quizzes