Defining Automated Data Encoding


After completing this lesson, you will be able to:

  • Describe the automated data encoding process in SAP Analytics Cloud

Missing Values

A missing value is an empty cell in your data set. These missing values can be due to a data collection error or because the values are available.

Understanding why data is missing is important and you must consider investigating, especially if there is a high percentage of missing values in some influencers.

Smart Predict handles missing values automatically:

  • Missing values are not excluded, they are replaced with a constant called Missing and then treated by the model as any other category.
  • You can assess the influence of the missing values when you have built the model and debriefed the model output.


For a continuous variable, an outlier is a single or low-frequency occurrence of the value of a variable that is far from the mean and the majority of other values for that variable.

For a categorical variable (nominal or ordinal), an outlier is a single or very low-frequency occurrence of a category of a variable.

An example using a continuous variable (binning)

The influence of outliers on a predictive model can lead to inaccurate predictions, so Smart Predict handles outliers automatically.

  • For nominal/ordinal variables, outliers are grouped into a dedicated noise category called Other, containing categories with other infrequent or non-robust values.
  • For continuous variables, the impact of outliers is reduced by grouping them into the bin for the smallest or largest values of the encoded variable.

