Setting Up Identity Resolution

Objective

After completing this lesson, you will be able to change the way customer data is matched to existing profiles and decide how it is merged during ingestion.

Introduction

In this lesson, we will explore the configuration options provided by SAP Customer Data Platform to define whether incoming customer data should be merged or not during the ingestion process. We will see how to define identifier attributes, how to change their priority order and quality precedence, and how to determine what happens when newly ingested customer data conflicts with existing customer data.

Getting to Know Identity Resolution

In simple terms, Identity Resolution defines which attributes will be used to identify customer data, and what happens when newly ingested customer data contains an identifier attribute value already present in existing customer data.

Identity Resolution in SAP Customer Data Platform is divided in two areas: Matching Rules allows you to specify which attributes will be used as customer data identifiers, while Merge Rules will set what happens when ingested customer data is matched to existing customer data during ingestion.

As this topic can be confusing, we first will go through the capabilities and setup of Matching Rules, before doing the same for Merge Rules. Later, we will describe how both play together to get a full understanding of Identity Resolution in action. Finally, we will set up a small scenario to demonstrate how these configurations can affect the results of customer data ingestion.

Defining Matching Rules

You usually want Application Identifiers to be immutable (meaning their value cannot be changed), which is why it’s important to select them carefully. Any top-level Profile attribute can be set as an Application Identifier using the Customer Schemas functionality. Arrays can also be set as Application Identifiers.

Even though it is optional, this is usually the first step when configuring Identity Resolution: check if the desired attribute is already set as the Application Identifier and set it if it is not.

The Customer Schemas Profile attribute primaryEmail property is highlighted on the picture, and it is set as an Application Identifier.

The next step is making sure your attribute is also present in the Matching Rules. This step is obligatory if you want your attribute to be considered for Identity Resolution. The system provides you with a pre-defined template for Matching Rules, divided into two groups: system-defined and user-defined.

The system-defined Matching Rules cannot be modified or removed, while user-defined ones can be changed or erased.

There’s currently only one attribute that is part of the system-defined Matching Rules: cdpId. It’s an internal attribute, and its value is automatically generated during ingestion. You see it as the internal profile id provided by the system. The cdpId attribute cannot be changed or erased on the Customer Schema Profile page, so if you don’t need it, you can simply ignore it.

All the other, user-defined Matching Rules can be changed, re-ordered, or even erased. You can also create new user-defined Matching Rules.

When customer data is matched during ingestion using one of the Matching Rules, you can configure whether the incoming data is to be merged into the existing data (the Merge Duplicates option), or instead duplicated using the Keep Duplicates option. In the latter case, a new customer data is created using the same identifier value. Merge Duplicates is usually the chosen option here.

It is also worth noting that multiple attributes can be part of the same Matching Rule. In that case, the system will concatenate the existing values and match the result against the concatenation of the corresponding ingested attribute values.

Matching Rules functionality showing the masterDataId user-defined Matching Rule being edited. The order and the options Merge Duplicates and Keep Duplicates are also shown here. You can also see some of the Matching Rules usage on the bar charts, based on past ingestions, shown on the right-hand side of the page.

Configuring Merge Rules

The Merge Rules define what will happen if a match is found for the Application Identifier during ingestion. You can opt for either following what was specified on the Matching Rule configuration, i.e. Merge or Keep Duplicates, or completely discard the incoming data. Please note that this option is only available if the Application Identifier setting of the attribute is checked.

You will find this feature in the Merge Rules tab when creating or editing a Profile attribute in the Customer Schemas functionality.

The Merge Rules feature as part of the attribute editor for the Profile schema in the Customer Schemas feature. The Conflict Policy has “Discard event if duplicate exists” unchecked, meaning it won’t be discarded.

It is also worth noting that the Source Application responsible for the ingestion has a Data Quality Rank setting. That setting is used when the ingestion corresponds to an existing customer data record and there is already data for the attribute. In that case, the Data Quality Rank of the Source Application being ingested is compared to the Data Quality Rank of the existing attribute data. If it’s equal or higher, the attribute data is overridden. Otherwise, the incoming attribute data is discarded.

This process is performed for every attribute’s data that could potentially be merged as part of the ingestion pipeline.

The Data Quality Rank setting is part of the Source Application configuration. Here you define how you will tag the level of customer data quality this application provides to your business unit.

Full Picture of Matching and Merge Rules

When Matching Rules are evaluated during ingestion, if no existing customer matches an Event-mapped Application Identifier, then a new customer is created.

But if the Matching Rules find one or more customers whose identifiers match the Event-mapped Application Identifiers, multiple actions can be triggered, depending on Matching Rules settings and on policies to manage Application Identifier conflicts.

In this case, one of the following outcomes is possible:

  • The customer Event is discarded
  • One customer is updated
  • Multiple customers are merged

In addition, data quality rules for all non-identifier attributes also apply, and the system will discard data whenever the Source Application used during ingestion has data of lesser quality than the existing data; this happens on a per-attribute basis.

As the full flow and its possible outcomes are difficult to explain in words, consider the diagram below which illustrates how different Matching and Merge Rule configurations can affect the way incoming data is ingested into your business unit.

Flowchart demonstrating step by step how Matching Rules settings can affect the ingestion of customer data.Continuation of previous flowchart, demonstrating step by step how Merge Rules settings can affect the ingestion of customer data. MR stands for Matching Rules.

Some quick notes on the diagrams above:

  • Event Matching Rules refer to all Application Identifiers that are part of Matching Rules linked to the Profile Schema during Event configuration.
  • Matching Rules are all the Matching Rules configured in the system.
  • Attribute data quality is found on the Source Application configuration the ingestion Event is part of.

Common Scenarios for Identity Resolution

Summary

In this lesson, you learned how certain attributes are used to identify customer data, a process known as Identity Resolution. You now understand that there are both system and user-defined Matching Rules that perform that task, and that the system Matching Rules are immutable. You also learned that duplicates can be kept, merged, or discarded. Finally, you were introduced to the concept of data quality ranking, and how such ranking can be used to favor the data for which you have more confidence during the merging process.