Identifying the Minimum Data Requirements and Distribution Criteria for the AI Workbench

Objective

After completing this lesson, you will be able to explore sample data and scenario

Introduction

The AI Workbench, a central component of the SAP Customer Data Platform’s insights capabilities, makes the power of data science accessible to business users and marketing analysts while providing advanced configuration options to data scientists. Your customer data is leveraged to run prediction operations, resulting in various analytical insights, dashboards, and Predictive Indicators.

User is configuring and running AI Models provided by AI Workbench. AI Workbench uses a large amount of existing customer profiles and activity data to infer predictive insights. The results produced by AI Workbench are shown published as Churn and CLV Predictive Indicators

Exploring the Minimum Data Requirements for the AI Workbench

To get the most out of the AI Workbench, you need to supply it with the minimum amount of customer data (profiles and activities) needed to produce meaningful analytical insights and predictive indicator values based on our pre-trained models and smart default settings (which can be customized). The minimum requirements, in terms of data volume, are as follows:

20,000+ profiles or groups
100,000+ orders

You can use the Customer Explorations feature to check if your Business Unit contains enough customer data to run the AI Workbench models.

Explaining Data Distribution

While most of you will run the models on top of your own customer data set, for this learning we will provide you with a sufficient synthetic sample customer data set that distributes the customer profile and activity data in a way that will trigger the models and generate meaningful Predictive Indicator values based on groups of customer behaviors.

You already know the importance of setting the timestamp of every piece of customer data upon ingestion, which allows you to determine when the customer was first created, their profile last updated, their orders placed, their support tickets submitted, and so on. The timestamp is part of the Event Metadata, and it allows the system to know when that particular datapoint event happened or changed, which in turn enables the SAP Customer Data Platform to plot the corresponding customer profile or activity on a timeline.

The provided synthetic sample customer data will have profiles and orders up to one year old from now, and the customer profiling will segregate customers into different groups based on shopping behavior simulations that will reflect customers either slowing down their shopping habits (Churn) or placing orders more frequently or for greater amounts / quantities (CLV). These habits will result in different shopping performance (Predictive Indicator values) based not only on the data itself, but also on the models’ configuration.

Exploring a Simple Sample Scenario that Fulfills our Data Distribution Criteria

To get more out of AI Workbench, we decided to produce synthetic sample customer data that will be used for model prediction using the following specifications:

50,000 profiles, ingested in the system anywhere between 1 year and 2 months ago.
Two sets of order distributions per profile:
1. The old orders set, which represents orders placed anywhere after the moment its profile was created and 2 months ago:
  - A random number of orders, with between 5 and 10 orders per profile.
  - A random order amount between $20 and $30.
2. The recent orders set, for orders placed up to 2 months ago, containing the following distribution:
  - 10% of the profiles should not have any orders in the last 2 months.
  - 20% of the profiles should perform only one small order (less than $100).
  - 30% of the profiles should perform between 4 to 8 small orders (each less than $100).
  - 20% of the profiles should perform a single large order (more than $300) in the last 2 months.
  - 20% of the profiles should perform between 4 to 8 large orders (each more than $300) in the last 2 months.

When reading the order distribution criteria used to produce such a controlled simulated data scenario, we can draw some simple conclusions:

30% of our customer base are slow shoppers.
50% of them shop based on what we consider "normal behavior".
20% of them are avid shoppers.

These scenarios will also reflect on the performance of their values for both Churn and CLV Predictive Indicators after we run the AI Workbench models using this dataset.

Next lesson