Explaining data types

After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Define the data types used in Smart Predict


Variables in Smart Predict

A variable corresponds to a column in the dataset, and rows contain the observations for the variable. For example, in a database containing information about customers, the <name> and <address> of those customers are variables.

In SAP Analytics Cloud Smart Predict, a variable can have the following properties:

  • Statistical type: continuous, ordinal or nominal
  • Data type for example, date, number, or string
  • Role: target (or signal), date, entity, or excluded

Statistical types

In SAP Analytics Cloud Smart Predict there are three statistical types of variable:

  1. Nominal variables: a discrete and unordered set of values or categories
  2. Ordinal variables: a discrete and ordered set of values
  3. Continuous variables: a real number that can take any value (with fractions/decimal places)
    When building a model, there is also Textual variables listed in the dropdown for Statistical Types. These are a type of nominal variable containing phrases, sentences, or complete texts and are used for text analyses. Textual variables are currently not supported by Smart Predict and not covered in this learning journey.

Data types

There are two data types used in SAP Analytics Cloud:

  1. Quantitative or numerical data:
    • Data are numbers and can be quantified
    • Data can be classified as either discrete or continuous
    • Data can be counted or measured, and summarized using mathematical operations such as addition or multiplication

    Examples include: age (28 years old), height of a person (200cm), grade score (85%), salary amount ($35,000)

  2. Qualitative or categorical data:
    • Data are either not numbers, or if they are numbers they cannot be quantified
    • Data items can be placed into distinct categories based on some attributes or characteristics
    • Data can only be summarized by frequency count (or mode). No other mathematical operators can be applied

    Examples include: gender, race, grading system (A, B, C, or 1, 2, 3), income level (low, medium, high)


To build a predictive model, you must define the following variable roles:

  1. Target or signal variable:
    • There can only be one target/signal variable
    • A target/signal variable is the variable that you are predicting, in other words, the model outcome (sometimes referred to as the dependent variable in other applications)
    • It can be binary (for classification) or continuous (for regression and time series)
    • There can be no missing values.
  2. Date variable:
    • This is the variable used for a date dimension
    • It is mandatory to include a date variable for a time series predictive scenario, but not for classification or regression scenarios
  3. Entity:
    • In a time series forecasting model, it is a nominal variable, or combination of variables, that split up the predictive model data into segments, with each available combination of entities producing its own forecasting model, creating distinct predictions for each entity combination.
    • The forecasting models can then reflect behaviors that are specific to a given segment, and so produce more accurate predictions.
    • The entity can be a dimension in the data, for example region, store, or product family, or a combination of dimensions.
    • It is optional in a time series predictive scenario
  4. Influencer:
    • There can be multiple influencer variables
    • The influencers are variables that describe the data and which serve to explain a target.

Storage format

To describe the data, the SAP Analytics Cloud uses the following data type formats:

  • String
  • Integer
  • Number
  • Boolean
  • Date
  • Date and time
  • Time

While Spatial and Othercan be found in the data type drop down for data types, they are not used at this time.

Storage format examples

The storage formatIs used to describe variables when their values corresponds to...Example
dateDate expresses in the following formats: YYYY-MM-DD YYYY/MM/DD2023-11-302022/04/28
datetimeDates and times expressed in the following formats:YYYY-MM-DD HH:MM:SS YYYY/MM/DD HH:MM:SS2023-11-30 14:08:172022/07/19 09:21:58
numberFigures, or numerical values on which operations may be performedThe variable salary, in US dollars: 1000.00, 1593, and 2000.54
integerFigures, or numerical integer values on which operations may be performedThe variable agein years: 21, 34, 99
stringAlphanumeric character stringsThe variable family name: Cheng, Miller, BenoitThe variable occupation: business analyst, professor, engineer The variable telephone number: 800 555 1234
booleanTrue or false1 or 0

Save progress to your learning plan by logging in or creating an account

Login or Register