Cleaning Data Automatically

Objective

After completing this lesson, you will be able to prepare data automatically.

Gather and Cleanse Historical Data

To create forecasts for the future you need data from the past, which usually means the sales history of a product from a certain time period. This data is gathered from SAP ERP or APO DP via CPI-DS or by uploading of CSV files using the Data Integration Jobs app. When the data is available, it can be checked for missing values, outliers, and promotion-related sales lifts, and consolidated with the help of the pre-processing algorithms selected for the forecast model you’re using.

There are three main methods to cleanse your data:

  • Substitute Missing Values
  • Outlier Correction (including Automatic Outlier Correction)
  • Promotion Sales Lift Elimination
Overview of pre-processing options in SAP IBP. Including Substitute Missing Values, Outlier Correction and Promotion Sales Lift Elimination. Additionally data can be cleansed manually.

Historical Data Outlier Correction

An "outlier" is a value in the historical data that lies outside the expected range of values. There can be a variety of reasons for this deviation, for example a data entry error or a one-time event that affected the sales results.

Graphical representation of a time-series with a peak, that can be identified as outlier, as it is outside the tolerance area of the historical data set.

Remark: When it comes to seasonal products, automatic outlier correction might by difficult and needs to be carefully validated as the values might get falsified as the season could get detected as outlier.

Missing Value Substitution

  • Full Horizon: Substitution will take place for historic and future data contained in the specified key figure
  • History Only: Substitution will take place for historic data contained in the specified key figure
  • Future Only: Substitution will take place for future data contained in the specified key figure

You can add the Substitute Missing Values algorithm more than once to your model. For example, you may want the system to substitute missing sales history values from the past with the mean of the existing values, and missing price values from the past and the future with the median of the existing values.

Models for Outlier Detection

There are two models in SAP Integrated Business Planning (SAP IBP) that help to detect outliers in your historical data:

  1. Interquartile Range Test: The system checks whether the time series values are within the interquartile range, which is the difference between the third quartile and the first quartile of the data. The values that are not within this range are identified as outliers.The figure describes the Interquartile Range Test, how the system checks whether the time series values are within the interquartile range, which is the difference between the third quartile and the first quartile of the data.
  2. Variance Test: The system checks whether the time series values deviate from the mean by more than the standard deviation multiplied by a constant. The values showing larger deviations are identified as outliers.The figure describes the Variance Test, how the system checks whether the time series values deviate from the mean by more than the standard deviation multiplied by a constant.

Outlier Detection Bounds

If data values are out of bounds, the system corrects the values. The following figure shows the upper-bound and the lower-bound.

The figure shows the upper-bound and the lower-bound. If data values are out of bounds, the system corrects the values.

Outlier Correction Methods

Clean Historical Sales Data

Task 1: Create a Model for Automatic Outlier Cleaning

Besides manual cleaning, SAP IBP comes with functionality that can clean the outliers automatically. We set this up in this task.

Steps

  1. Go to the SAP Fiori app Manage Forecast Models. Create a new forecast profile called OutlierCorrection## that detects outliers using the variance test in 12 periods of weekly history and correct them using the mean method. Detect outliers in the Actuals Qty key figure and store the corrections in the ADJDELIVQTY key figure.

    FieldValue
    Model NameOutlierCorrection##
    PeriodicityWeekly
    Historical Periods12
    Outlier Detection MethodVariance Test
    Multiplier1
    Outlier Correction MethodCorrection with Mean
    Input for AlgorithmActuals Qty
    Save Result inDelivered Qty. Adjusted
    1. Log in to the SAP Fiori user interface with the login credentials provided by the instructor.

    2. In the Demand Planner group, choose the Manage Forecast Models app.

    3. Choose Create Model, and select the planning area ZSAP6.

  2. Enter the details of the new forecast profile using information from the table provided.

    1. Enter the model name OutlierCorrection##, and a description.

    2. Set Periodicity to Weekly.

    3. Set Historical Periods to 12.

    4. Choose the PRE-PROCESSING STEPS tab.

    5. Add a new algorithm by choosing the + icon on the right of the screen.

    6. In the displayed dialog box, choose Outlier Correction.

    7. In the Outlier Detection Method field, choose Variance Test, and in the Multiplier field, enter 1.

    8. In the Outlier Correction Method field, choose Correction with Mean.

    9. Choose Actuals Qty as the input key figure.

    10. In the Save result in field, choose Delivered Qty Adjusted.

    11. Choose Save.

Task 2: Execute the Model for Outlier Cleaning

After we have configured our outlier correction model, we need to execute the calculation.

Steps

  1. Log in to the add-in for Microsoft Excel and choose your planning view favorite. If you have already logged into the add-in for Microsoft Excel, you need to log in again to see the changes made in the prior step.

    1. Log in to the add-in for Microsoft Excel with the login credentials provided by the instructor.

    2. Choose your favorite Demand Planning.

  2. Run the outlier correction calculation and observe the result. Adapt the previous filter created for HT_00## when executing the application job.

    FieldValue
    Time PeriodWeekly
    AttributesProduct ID, Customer ID
    Forecast model to be usedOutlierCorrection##
    UoM to IDKG
    FilterProduct ID =HT_0##
    1. In the Application Jobs part of the ribbon, choose Statistical ForecastingRun.  

    2. A message will pop up informing that the settings of the view will be copied over to the job prompt. Choose Ok.

    3. Set the Time Period to Weekly with 12 weeks into the past.

    4. Select the AttributesProduct ID and Customer ID.

    5. In the Forecast model to be used field, choose OutlierCorrection##.

    6. In the Unit of Measure field, choose KG.

    7. Go to the Filter tab, and filter on your specific product with Product ID = HT_0##.

    8. Choose Next. A pop up for the reason code will show up. Choose Run.

    9. A message will pop up informing about the status of the job, with navigation to detailed status report. Choose Navigate to status to see the status of the job. Refresh the popup box until it reports that the job is complete. Choose OK to close the pop up.

    10. Wait for the job to complete. Check the Business Log in the Status for a list of all outliers that have been corrected.

    11. Refresh the data in the planning view and notice any outlier values that have been adjusted.