Analyzing the Results of a Regression Model in SAP Analytics Cloud Smart Predict

Objectives

After completing this lesson, you will be able to:

  • Explain how results of a regression model are analyzed in Smart Predict

Overview Report

Global Performance Indicators

Root Mean Squared Error (RMSE) measures the average difference between values predicted by the predictive model and the actual values. It provides an estimation of how well the predictive model is able to predict the target value (accuracy).

  • The lower the value of the RMSE, the better the predictive model is.
  • A perfect predictive model (a hypothetic predictive model that would always predict the exact expected value) would have an RMSE value of 0.
  • The RMSE has the advantage of representing the amount of error in the same unit as the predicted column making it easy to interpret. For example, when predicting an amount in dollars, the RMSE can be interpreted as the amount of error in dollars.
  • To improve the Root Mean Squared Error, add more influencer variables in the training data set.

Prediction confidence indicates the predictive model's capacity to achieve the same accuracy when you apply it to a new data set with the same characteristics as the training data set.

  • Prediction confidence takes a value between 0% and 100%.
  • The predicted confidence value must be as close as possible to 100%.
  • To improve prediction confidence, you can add new rows to your data set, for example.

Target Statistics

Gives descriptive statistics for the target variable in each data set.

NameMeaning
MinimumThe minimum value found in the data set for the target variable.
MaximumThe maximum value found in the data set for the target variable.
MeanThe mean of the target variable.
Standard deviationThe measure of the extent to which the target values are spread around their average.

Influencer Contributions

The chart in the Overview report shows how the top five influencers impact the target. It is also shown in the Influencer contributions report, with all additional influencers covered in that report.

Predicted vs. Actual

This chart compares the prediction accuracy of the predictive model to a perfect predictive model and shows the predictive model errors.

During the training phase, predictions are calculated by using the training data set. To build the graph, Smart Predict groups these predictions into twenty segments (or bins), with each segment representing roughly 5% of the population.

For each of these segments, some basic statistics are computed:

  • Segment mean is the mean of the predictions on each segment.
  • Target mean is the mean of the actual target values.
  • Target variance is the variance of this target within each segment.

By default, the following curves are displayed:

  1. The Validation - Actual curve shows the actual target values as a function of the predictions.
  2. The hypothetical Perfect Model curve shows that all the predictions are equal to the actual values.
  3. The Validation - Error Min and Validation - Error Max curves show the range for the actual target values.

The area between the Error Max and Error Min represents the possible deviation of the current predictive model - it is the confidence interval around the predictions.

For each curve, a dot on the graph corresponds to the segment mean on the X-axis, and the target mean on the Y-axis.

Interpreting the chart: Three main conclusions can be made using the Predicted vs. Actual chart, depending on the relative positions of the curves on the graph.

  1. If the validation and perfect model curves do not match:
    • The predictive model is not accurate.
    • To confirm this conclusion, check the prediction confidence indicators.
    • If the indicators confirm that the predictive model is unreliable, improve accuracy by adding more rows or variables to the input data set.
  2. If the validation and perfect model curves match closely:
    • The predictive model is accurate.
    • To confirm this conclusion, check the predictive confidence indicators.
    • If the indicators confirm its reliability, trust the predictive model and use its predictions.
  3. If the validation and perfect model curves match closely but diverge significantly on a segment:
    • The predictive model is accurate, but its performance is hindered in the diverging segment.
    • To confirm this conclusion, check the predictive confidence indicators.
    • If the indicators confirm its overall reliability, improve that segment's predictions by adding more rows or variables in the input data set.

Influencer Contributions Report

Influencer Contributions

This chart shows how the influencers impact the target.

  • By default, all influencers are displayed and are sorted by decreasing importance.
  • The Influencer Contributions show the relative importance of each variable used in the predictive model.
  • Only the contributive influencers are displayed in the reports. The variables with no contribution are hidden. The more contributive ones are those that best explain the target.
  • The sum of their contributions equals 100%.

Grouped Category Influence

The grouped category Influence report analyzes the influence of different categories of a variable on the target:

  • If the influence value is positive, a high target value is more likely.
  • If the influence value is negative, a high target value is less likely.
  • The influence of a category can be positive or negative.

Grouped category influence shows groupings of categories of an influencer, where all the categories in a group share the same influence on the target variable.

  • The X-axis represents the influence of the grouped categories on the target variable.
  • The Y-axis represents the grouped categories.

The length and direction of a bar shows whether the category has more or fewer high value observations compared to the mean:

  • A positive bar (influence on target greater than zero) indicates that the category contains more observations from the high target values compared to the mean (calculated on the entire validation data set).
  • A value of 0 means that the category has no specific influence on the target.
  • A negative bar (influence on target less than zero) indicates that the category contains fewer observations from the high target values compared to the mean (calculated on the entire validation data set).

Group Category Statistics

The grouped category statistics chart shows the details of how the grouped categories influence the target variable over the selected data set.

  • The X-axis displays the target mean: For a continuous target, the target mean is the average of the target variable for the category in the data set.
  • The Y-axis displays the frequency of the grouped categories in the selected data set.

Next Steps

Once you have analyzed your predictive model, you have two choices:

1. The predictive model's performance is satisfactory. If you are happy with your model's performance, then use it and apply the model.

2. The predictive model's performance must be improved. If you are unhappy with the model's performance, you must experiment with the settings.

To experiment with the settings, you can:

  • Duplicate the predictive model.
    1. Open the predictive scenario, which contains the predictive model to be duplicated.
    2. Open the Predictive Model list.
    3. Choose the predictive model level to be duplicated.
    4. Select Copy in the menu. An exact (untrained) copy of the original version of the predictive model is created.
    5. Compare the two versions and find the best one.
  • Update the settings of the exiting model and retrain it. This erases the previous version.

Log in to track your progress & complete quizzes