Use Modeling Options to Integrate Data from Different Sources

Objective

After completing this lesson, you will be able to use modeling options to integrate data from different sources

Integration of Data from Different Sources

Scenario

For our case study, let's look at the situation and requirements related to data in ITelO from different sources.

Situation

  • The same real-world object is represented by different identifiers in different source systems.

  • Different real-world objects can by chance have the same identifier in different source systems.

  • Corresponding values have different technical format.

Requirements

  • The same real-world object must be represented by the same SAP BW/4HANA characteristic value.

  • Different real-world objects must be represented by different characteristic values.

  • Different formats and different currencies must be harmonized.

  • You want to generate a minimum of transformation rules and data records.

Integrating data from different sources into one common storage is a typical task when building a data warehouse.

The data from different sources, such as tables in different source systems or different tables in the same source system, is collected in the data warehouse.

First, you must identify the source fields that contain identifiers for a specific real word entity, for example, product. Then, you must ask the following questions:

  • Does the same product have the same ID in both sources?
  • Do different products have different IDs?

If the answer is yes to both of these questions, no change is required to the corresponding fields during data integration.

If the answer is no to these questions, then the following tasks have to be performed:

  • Separate different objects.

    This process separates objects that have the same identifier in different sources.

  • Map different values.

    This process determines a common identifier for each real-world object that replaces the different original identifiers from different sources.

Decide what kind of object must be created to generate the result set. There are different scenarios, depending on the degree of data transformation.

Integration Tasks

The following tasks are needed to integrate data from different sources:

  1. Technical harmonization:

    Find the corresponding original fields for each entity, and compare the formats of these fields. Determine a common format that is general enough to contain all different values. Consider the future.

    Note

    A field in one source may have a smaller length or a more restrictive data type than the corresponding field in another source. If so, find the most general data type that can contain all values. For example, if one source has a NUMC(10) field with up to 10 digits, and the other source has a NUMC(13) field with up to 13 digits, choose at least NUMC(13). If one source has a NUMC(10) field with 10 digits, and the corresponding field of the other source is CHAR(10) with 10 letters, then choose CHAR(10).
  2. Content separation:

    Distinguish different objects from different sources.

  3. Content harmonization:

    Harmonize different values representing the same object into a common SAP BW/4HANA identifier.

Content Separation

Suppose that you load values from two sources into the same data warehouse table, and each record has a unique order identifier (ID) within its source. You want to avoid overwriting a sales order of one source with the sales order of the other source. Therefore, you must make sure that the key values in the data warehouse are different if the orders come from different sources.

Storing separate keys for separate sources has the following advantages:

  • No matter what source is loaded first, the resulting integrated data in the data warehouse is the same.
  • You can model the integrated data as the union of the sources.

Separating keys can be achieved in the following ways:

  • Number ranges:

    The number range approach consists of separating the possible key values in the sources by defining nonoverlapping number ranges for each source.

  • Key Extension (Compounding):

    If different objects can be delivered with the same original key, extend the key to a combined key of original value and source.

  • Concatenation:

    If different objects can be delivered with the same original key, concatenate the source and the original key into a long key field.

Separation by Using Number Ranges

If you can design the process of generating key values in the sources, define different, nonoverlapping number ranges for each source. Make sure that each source creates key values of its own number range only. Then, you can take the ID values as they are into the data warehouse.

Having the same IDs in the data warehouse and in the source system has the following advantages:

  • No transformation is required.
  • Reports created in the data warehouse and reports created in the source system show the same ID values.
  • It is easier to identify the corresponding entries in the source when you are investigating a specific order.
  • It is possible to join tables from the source with data warehouse tables.

Even if you have no control over the number ranges, check if the sources provide already disjoint sets of identifiers. For example, sales and purchase orders are taken from different number ranges in most SAP applications. In addition, look at the fields' formats. There cannot be overlaps if all ID numbers are purely numerical values in one source, and each ID starts with two letters in the second source. Check which rules apply for generating the ID values. For example, license plate numbers have different formats in different countries/regions. Check if rules for the generation of key values, in combination with the field format, ensures nonoverlapping sets of values. Imagine, for example, the following situation: in one source, 10-digit order numbers start with 1, and in the other source, 13-digit order numbers start with 1. They will be different.

Note that some ranges seem to be different, but in fact, have overlaps. If the key is the date, and the sources correspond to different years, normally there is no overlap. However, consider the following situation: A value for December 31, 2016, is written into the 2016 storage during the day and on January 1, 2017, an updated value for December 31 is written to the new year's storage. In this case, the same date (December 31, 2016) occurs in both sources. To avoid overwriting one value with the other, you must create a transformation that includes information about the source into the key.

Separation by Using Key Extension (Compounding)

The method Key Extension (or Compounding) shows a different way to keep values from different sources apart.

Suppose that the same original key value exists in different sources, representing different entities and you want to differentiate them in the data warehouse. The easiest way is to add a source identifier to the key. This method has the following advantages:

  • You can easily identify from which source a record comes.
  • Records from different sources remain separate.
  • The necessary transformation is simple: the order ID is assigned 1:1 from the sources, and a source ID is generated as a constant value.
  • It is easy to filter for a source or an original order number.
  • SAP BW/4HANA provides a technical concept called characteristics compounding, in the InfoObject definition for this method.

Compounding the source ID to the original key has the following disadvantages:

  • The transformation has to be performed in the same way for all loading processes or views (master data and transaction data).
  • To understand the data, the business user must understand the concept of a combined key.
  • To avoid wrong aggregation, the field source must be included in all objects containing order.
  • More fields need to be modeled.
  • Incorrect combinations of filters for source and order lead to an empty result set.

The SAP BW/4HANA feature characteristics compounding supports this method.

Separation by Using Concatenation

The method Concatenation means that the first characters of the data warehouse order key are taken from an ID for the source. The latter characters from the original order key, and together they form a single key field.

The Concatenation concept has the following disadvantages:

  • You need to implement more complex transformations, involving a formula, or a program.
  • You need a range selection to filter for the source, for example, A0000 - A9999

The Concatenation concept has the following advantages:

  • One field is sufficient to distinguish different orders in the report.
  • If another field needs to be included into the key, change the transformation and reload the values. The objects remain unchanged.
  • If this concatenation is not needed for a specific source system, you can check the source in the formula or program coding.

You can use a global transfer routine for the SAP BW/4HANA characteristic to implement the concatenation approach.

Content Harmonization of Master Data

For content harmonization, a unique identifier for the same object has to be defined. It can also involve corrections related to bad quality of source data.

For example, the same vendor is represented in different sources with different key values. Which value must be valid in the data warehouse? An employee has entered the wrong text for a vendor in the source. How can this error be corrected in content harmonization?

How can values from different sources be identified and corrected in SAP BW/4HANA?

A mapping table is a table that contains the corresponding New key values for each source and original key value. For the original key format, use the most general format as recognized during the technical harmonization.

For performance reasons, we recommend using a DataStore Object (advanced) in SAP BW/4HANA for storing the mapping table. Then, you can use a transformation rule read from data store object (advanced) instead of a routine to derive the new key values.

Store the derived New key in the SAP BW/4HANA master data tables. If the original key values are still needed, you simply add the New key as an attribute of the original vendor. If there is no need to display or use the original values, use the New key instead of the original key. Then only one record per New key remains, which reduces the number of master data entries and makes the master data tables smaller.

Content Harmonization of Transactional Data

You can also use a mapping table for harmonizing transactional data. You can check the referential integrity. This means that if no matching entry exists in the mapping table, an error message is generated. Alternatively, you can generate a workflow. A business user (or a smart program) checks if this value can be identified with an existing New key, or if another New key must be generated.

Suppose that you expect only one value for an entity. For example, value type must always be Actuals, or a unit of measure must always be Pieces. However, the source field is empty or doesn't exist. In this case, you don't need a mapping table with one entry: you fill such a field with a constant value. You may want to generate an error when a different value appears in the source.

In a default value scenario, you can combine a mapping table and a constant. First, check if a matching entry for a given record is listed in the mapping table. If not, choose always the same default value. A typical default value is Others.

How can other harmonization issues be resolved?

In this example, source records contain amounts in different currencies, and different additional information. In the example, source A contains order numbers, but source B does not. To build a compatible table structure in the central SAP BW/4HANA, on the global level, several harmonization tasks are necessary:

  • Generate order numbers for source B or store the sum for each vendor.

  • Ensure that the order numbers in the central table are different, when order numbers are taken from different sources.

  • To map the vendor ID to a global SAP BW/4HANA vendor key, use a mapping table.

  • Source B lists information about countries, but no tax information. There is a country master data table from which the currency and tax rate can be derived. Join these two tables to find the tax rate.
  • Calculate the absolute tax value.
  • Convert all different values to USD (US $).
  • Standardize the number format as it is different.

Depending on the modeling focus for this scenario, these tasks are performed using SAP BW/4HANA transformations, or in SAP HANA, by calculation views.

Best Practise of Combining Two Sources

What objects are needed to integrate data from different sources? It depends on the complexity and on the modeling focus (SAP BW/4HANA, with, or without persistent target, or SAP HANA with virtual views, or a mixed scenario). Try to model as few physical layers as possible using CompositeProviders, InfoSources, and SAP HANA Calculation Views to combine data.

In simple scenarios, if both sources have a common field and the format and the values already match, a join can be performed directly without transformation. With SAP HANA as the database, joins can be executed in memory whenever needed. If the original values remain available, there is no need to save the join result permanently.

Imagine, for example, a master data table and a text table use the same technical key. Joins can be defined as simple database joins, or in the following different ways using SAP BW/4HANA or SAP HANA features:

  • An SAP HANA calculation view can contain inner, outer joins, referential, temporal, and text joins.
  • An Open ODS View can be associated to another Open ODS View, especially when referential integrity is enforced, for example, to combine attributes and texts.
  • A CompositeProvider can contain inner and outer joins.

In different scenarios, the sources do not deliver all fields in the format or with the data that is required for the join. In such cases, combine the transformations with a join.

Find out which transformations are specific to one of the sources (source-specific transformations), and which transformations are valid for all sources (source-independent transformations).

For the following examples, use source-specific transformations:

  • Format harmonization: Vendor identifiers have 10 characters in one source, and 13 digits in the other source, and must be enlarged to a common 13-character format
  • Key extension: In the context of SAP BW/4HANA characteristics compounding, add the specific source-system as a constant.
  • Deriving missing values: Suppose one source delivers tax rate and net amount, and the other source delivers tax amount. Then, the calculation of the tax amount is only necessary for one source.
  • Applying source-specific mapping tables: Suppose the vendor values of one source are taken as the New key. Then, mapping is only necessary for original vendor values from the other source.

In the following cases, use source-independent transformations:

  • Applying mapping tables: From a common mapping table, read the harmonized SAP BW/4HANA value that corresponds to the combination of source ID and original vendor ID.
  • Currency conversion: Use the same technology to convert different currencies into the same common currency, such as euro.
  • Common mathematical formulas: If both sources contain net amount and tax amount, calculate gross amount.

When you must generate reliable harmonized values, you store the result of these transformations permanently (refer to the middle part of the figure). We recommend that you implement an SAP BW/4HANA data flow with an InfoSource as a nonpersistent intermediate layer, and a DataStore Object (advanced) as a persistent storage of the result. Split transformations in a way that you do not need to create the same transformation twice.

Implement the source-specific parts of the transformation before the InfoSource. If there are parts of transformations that are independent of the source, they can be implemented between the InfoSource and the target.

Use hierarchy of calculation views when you must report in real time on harmonized values (refer to the right part of the figure), and the result doesn't need to be saved permanently. In a similar way, source-specific formulas can be implemented in the first layer, and source-independent formulas in a higher level layer. At the end, a union or join can be performed in a CompositeProvider.

If you are not sure what scenario to choose, choose the more flexible options with transformations.

Overview of Integration Tasks

Integration Tasks

Tasks1 Technical Harmonization: determine corresponding fields and a common format
2 Content Separation: distinguish different objects...3 Content Harmonization: determine a unique identifier for the same object...

… without existing representation:

Generate new values using number ranges.Add a constant.

… represented always by the same value:

Use key extension (compounding), or source concatenation.(No transformation).

… represented by the same or by different values:

Use key extension (compounding), or source concatenation.Read from a mapping table.

… represented always by different values:

(No transformation).Read from a mapping table.

So, finally, let's have a look at an overview of the integration tasks.

First, find the corresponding original fields for each entity, and compare the formats of these fields. Determine a common format that is general enough to contain all different values. Consider the future.

Second, distinguish different objects from different sources. If no identifiers are available, generate new values within a source-specific number range in a source-specific transformation. If different objects can be delivered with the same original key, extend the key to a combined key of original value and source. Alternatively, you can concatenate the source and the original key into a long key field. If you are sure that now and in future, the sources already deliver different values, no transformation for this step is needed.

Finally, harmonize different values representing the same object into a common SAP BW/4HANA identifier. If no value is presented for an entity with only one object, add a constant. If different values exist, define a mapping table and read the corresponding identifier from there.

Log in to track your progress & complete quizzes