Variables in Smart Predict
A variable corresponds to a column in the dataset, and rows contain the observations for the variable. For example, in a database containing information about customers, the <name> and <address> of those customers are variables.
In SAP Analytics Cloud Smart Predict, a variable can have the following properties:
- Statistical type: continuous, ordinal or nominal
- Data type for example, date, number, or string
- Role: target (or signal), date, entity, or excluded
Statistical types
In SAP Analytics Cloud Smart Predict there are three statistical types of variable:
- Nominal variables: a discrete and unordered set of values or categories
- Ordinal variables: a discrete and ordered set of values
- Continuous variables: a real number that can take any value (with fractions/decimal places)
Note
When building a model, there is also Textual variables listed in the dropdown for Statistical Types. These are a type of nominal variable containing phrases, sentences, or complete texts and are used for text analyses. Textual variables are currently not supported by Smart Predict and not covered in this learning journey.
Data types
There are two data types used in SAP Analytics Cloud:
- Quantitative or numerical data:
- Data are numbers and can be quantified
- Data can be classified as either discrete or continuous
- Data can be counted or measured, and summarized using mathematical operations such as addition or multiplication
Examples include: age (28 years old), height of a person (200cm), grade score (85%), salary amount ($35,000)
- Qualitative or categorical data:
- Data are either not numbers, or if they are numbers they cannot be quantified
- Data items can be placed into distinct categories based on some attributes or characteristics
- Data can only be summarized by frequency count (or mode). No other mathematical operators can be applied
Examples include: gender, race, grading system (A, B, C, or 1, 2, 3), income level (low, medium, high)
Role
To build a predictive model, you must define the following variable roles:
- Target or signal variable:
- There can only be one target/signal variable
- A target/signal variable is the variable that you are predicting, in other words, the model outcome (sometimes referred to as the dependent variable in other applications)
- It can be binary (for classification) or continuous (for regression and time series)
- There can be no missing values.
- Date variable:
- This is the variable used for a date dimension
- It is mandatory to include a date variable for a time series predictive scenario, but not for classification or regression scenarios
- Entity:
- In a time series forecasting model, it is a nominal variable, or combination of variables, that split up the predictive model data into segments, with each available combination of entities producing its own forecasting model, creating distinct predictions for each entity combination.
- The forecasting models can then reflect behaviors that are specific to a given segment, and so produce more accurate predictions.
- The entity can be a dimension in the data, for example region, store, or product family, or a combination of dimensions.
- It is optional in a time series predictive scenario
- Influencer:
- There can be multiple influencer variables
- The influencers are variables that describe the data and which serve to explain a target.