Variables in Smart Predict
A variable corresponds to a column in the data set, and rows contain the observations for the variable. For example, in a database containing customer information, the <name> and <address> fields are variables.
In SAP Analytics Cloud Smart Predict, a variable can have the following properties:
- Statistical type: continuous, ordinal, or nominal
- Data type: date, number, or string, for example
- Role: target (or signal), date, entity, or excluded
Statistical Types
In SAP Analytics Cloud Smart Predict, there are three statistical types of variables:
- Nominal variables: a discrete and unordered set of values or categories
- Ordinal variables: a discrete and ordered set of values
- Continuous variables: a real number that can take any value (with fractions or decimal places)
Note
Data Types
There are two data types used in SAP Analytics Cloud:
- Quantitative or numerical data:
- Data are numbers and can be quantified.
- Data can be classified as either discrete or continuous.
- Data can be counted or measured and summarized using mathematical operations such as addition or multiplication.
Examples include: age (28 years old), height of a person (200 cm), grade score (85%), and salary amount ($35,000)
- Qualitative or categorical data:
- Data are either not numbers, or if they are numbers, they cannot be quantified.
- Data items can be placed into distinct categories based on some attributes or characteristics.
- Data can only be summarized by frequency count (or mode). No other mathematical operators can be applied.
Examples include: gender, race, grading system (A, B, C, or 1, 2, 3), and income level (low, medium, high).
Role
To build a predictive model, you must define the following variable roles:
- Target or signal variable:
- There can only be one target/signal variable.
- A target/signal variable is the variable that you are predicting. In other words, the model outcome (sometimes referred to as the dependent variable in other applications).
- It can be binary (for classification) or continuous (for regression and time series).
- There can be no missing values.
- Date variable:
- This is the variable used for a date dimension.
- It is mandatory to include a date variable for a time series predictive scenario, but not for classification or regression scenarios.
- Entity:
- In a time series forecasting model, it is a nominal variable, or a combination of variables that split up the predictive model data into segments. Each available combination of entities produces a separate forecasting model, yielding distinct predictions for each combination.
- The forecasting models can then reflect behaviors that are specific to a given segment, and therefore produce more accurate predictions.
- The entity can be a dimension in the data, for example, region, store, or product family, or a combination of dimensions.
- It is optional in a time series predictive scenario.
- Influencer:
- There can be multiple influencer variables.
- The influencers are variables that describe the data and serve to explain a target.
Additional Information
For more information on using variables in Smart Predict, you can visit Variables in Smart Predict | SAP Help Portal.