Let's say you need to load some product data. You have two source tables, one with some product description information (label and category), and one with discount information ( discount date and discount percentage). Both identify products with its Product ID.
You want to load the data into a target table that should contain all product description and the discount percentage if any.
You could perform this load with a Query transform doing some outer join. But then, you would have NULL values for those products without any discount.
And what about the products having more than one discount? The job would fail trying to insert duplicate keys.
In this scenario, lookup functions will help you achieve your goal.
The Lookup Function
The lookup function uses a value that you provide in an expression to find a corresponding value in a file or different table. Specifically, the function searches for the row in the lookup_table where the value in the compare_column matches the value in expression. The function returns the result_column value from this matching row.
Let me show you an example:
But what if you have several matching rows ?
Multiple Matching Rows Handling
For standard RDBMS tables, the lookup function finds the matching row with the maximum value in the result_column and returns that value.
For SAP application tables or flat files, the lookup function randomly selects a matching row and returns the value in the result_column for that row.
Advanced Lookup Functions
To further control the matching row returned by the lookup process, you can use lookup_ext or lookup_seq functions.
While all lookup functions return one row for each row in the source, they differ in how they choose which of several matching rows to return:
- Lookup does not provide additional options for the lookup expression.
- Use lookup_ext to specify an order by column and return policy, for example min or max , to return the record with the lowest or highest value in a given field.
- Use lookup_seq in matching records to return a field from the record where the sequence column, for example, effective_date , is closest to but not greater than a specified sequence value, for example, a transaction date.
Comparison of Join and Lookup
Let's recap and compare the two solutions:
- You can join to files (via a file format)
- The join can be performed by the database server (if tables only)
- You can return multiple values from source (for example First Name and Last Name)
- You can perform Inner or Left Outer join
- Unmatched values on outer join are always null
- You can choose between setting cache either on or off (e.g. whole table or none)
- You need fewer objects in the data flow (less cluttered)
- You can use Imported tables only (no template tables nor file formats)
- The lookup is performed on the job server
- You can return only one value (you can use Lookup_ext to return multiple columns from reference table)
- You can perform Left Outer Join only, but ability to set default value (e.g. something other than Null)
- You can control return policy if multiple rows match in case of one to many (e.g. Min/Max on a date field for oldest or newest)
- You can choose more sophisticated caching options