Explaining Data Types

Objectives

After completing this lesson, you will be able to:

  • Define the data types used in Smart Predict

Variables

Variables in Smart Predict

A variable corresponds to a column in the data set, and rows contain the observations for the variable. For example, in a database containing information about customers, the <name> and <address> of those customers are variables.

In SAP Analytics Cloud Smart Predict, a variable can have the following properties:

  • Statistical type: continuous, ordinal, or nominal
  • Data type: date, number, or string, for example
  • Role: target (or signal), date, entity, or excluded

Statistical Types

In SAP Analytics Cloud Smart Predict there are three statistical types of variables:

  1. Nominal variables: a discrete and unordered set of values or categories
  2. Ordinal variables: a discrete and ordered set of values
  3. Continuous variables: a real number that can take any value (with fractions or decimal places)
Note
When building a model, textual variables are also listed in the dropdown for Statistical Types. Textual variables are a type of nominal variable containing phrases, sentences, or complete texts, and are used for text analysis. Textual variables are currently not supported by Smart Predict and not covered in this learning journey.

Data Types

There are two data types used in SAP Analytics Cloud:

  1. Quantitative or numerical data:
    • Data are numbers and can be quantified.
    • Data can be classified as either discrete or continuous.
    • Data can be counted or measured and summarized using mathematical operations such as addition or multiplication.

    Examples include: age (28 years old), height of a person (200 cm), grade score (85%), and salary amount ($35,000)

  2. Qualitative or categorical data:
    • Data are either not numbers, or if they are numbers they cannot be quantified.
    • Data items can be placed into distinct categories based on some attributes or characteristics.
    • Data can only be summarized by frequency count (or mode). No other mathematical operators can be applied.

    Examples include: gender, race, grading system (A, B, C, or 1, 2, 3), and income level (low, medium, high).

Role

To build a predictive model, you must define the following variable roles:

  1. Target or signal variable:
    • There can only be one target/signal variable.
    • A target/signal variable is the variable that you are predicting. In other words, the model outcome (sometimes referred to as the dependent variable in other applications).
    • It can be binary (for classification) or continuous (for regression and time series).
    • There can be no missing values.
  2. Date variable:
    • This is the variable used for a date dimension.
    • It is mandatory to include a date variable for a time series predictive scenario, but not for classification or regression scenarios.
  3. Entity:
    • In a time series forecasting model, it is a nominal variable, or combination of variables that split up the predictive model data into segments. Each available combination of entities produces its own forecasting model, creating distinct predictions for each entity combination.
    • The forecasting models can then reflect behaviors that are specific to a given segment, and therefore produce more accurate predictions.
    • The entity can be a dimension in the data, for example, region, store, or product family, or a combination of dimensions.
    • It is optional in a time series predictive scenario.
  4. Influencer:
    • There can be multiple influencer variables.
    • The influencers are variables that describe the data, and which serve to explain a target.

Additional Information

For more information on using variables in Smart Predict, you can visit Variables in Smart Predict | SAP Help Portal.

Storage Format

To describe the data, the SAP Analytics Cloud uses the following data type formats:

  • String
  • Integer
  • Number
  • Boolean
  • Date
  • Date and Time
  • Time
Note

While Spatial and Other can be found in the data type dropdown for data types, they are not used currently.

Examples: Storage Format

The storage formatUsed to describe variables when their values correspond to...Example
DateDate is expressed in the following formats: YYYY-MM-DD YYYY/MM/DD2023-11-302022/04/28
Date and TimeDates and times are expressed in the following formats:YYYY-MM-DD HH:MM:SS YYYY/MM/DD HH:MM:SS2023-11-30 14:08:172022/07/19 09:21:58
numberFigures, or numerical values on which operations can be performed.The variable salary, in U.S. dollars: 1000.00, 1593, and 2000.54
IntegerFigures or numerical integer values on which operations can be performed.The variable age in years: 21, 34, 99
StringAlphanumeric character strings.The variable family name: Cheng, Miller, BenoitThe variable occupation: business analyst, professor, engineer The variable telephone number: 800 555 1234
BooleanTrue or false1 or 0

Log in to track your progress & complete quizzes