
You can improve the performance of data transformations that occur in memory by caching as much data as possible. By caching data, you limit the number of times the system must access the database. Cached data must fit into available memory.
The following table contains operations that benefit from caching data in memory during data flow processing.
Operations that can Benefit from Data Caching
Operation | Description |
---|---|
Joins | Because Data Services must read an inner source of a join for each row of an outer source, cache a source when it's used as an inner source in a join. |
Table comparisons | Because Data Services must read a comparison table for each row of a source, cache the comparison table. |
Lookups | When a lookup table exists in a remote database, cache the table in memory to reduce access times. |
Caching Joins
The join operation in a Query transform uses the cache settings from the source, unless you change the setting in the Query editor.
In the Query editor, the cache setting is set to Automatic by default. The Automatic setting carries forward the cache settings from the source table.
The following table shows the relationship between cache settings in the source and cache settings in the Query editor, and the effective cache setting for the join.
Cache settings
Cache Setting in Source | Cache Setting in Query Editor | Effective Cache Setting |
---|---|---|
Yes | Automatic | Yes |
No | Automatic | No |
Yes | Yes | Yes |
No | Yes | Yes |
Yes | No | No |
No | No | No |
Note
For more information on join caching, see the Performance Optimization Guide:Data Cache Types
You can use two different types of caching for the operations in a data flow.
- In-memory: Use when your data flow processes a small amount of data that fits in memory.
- Pageable cache: Use when your data flow processes a large amount of data that doesn't fit in memory.
Administrators select a pageable cache location to save content over the 2 GB RAM limit. The pageable cache location is set up in Server Manager and the option to use pageable cache is selected on the Data flow Properties dialog box.
Create persistent cache datastores by selecting Persistent Cache as the database type in the Create New Datastore dialog box. The newly–created persistent cache datastore shows in the list of datastores, and can be used as a source in jobs.
Note
For more information, see the Designer Guide: