Building a Time Series Model Using SAP Analytics Cloud Smart Predict

Objective

After completing this lesson, you will be able to Build a time series model using Smart Predict.

Data Sets for a Time Series Model

Input Data Set Recap for Building Time Series Predictive Scenarios

The training data set contains the past observations (the history) that are used to generate the predictive model and the data and time when the observations were recorded.

  • In this data set, the values of the signal variable are known.
  • The data set might also contain some influencer variables.
  • The past and future values of the influencer variables must be known (at least for the expected forecast horizon).

By analyzing the training data set, Smart Predict generates the time series model.

Times series model with the settings pane open on the right hand side.

Limitations

The training or application input data set must not contain more than 1,000 columns.

While applying the predictive model to an application data set, Smart Predict generates additional columns and the application process can get blocked if the application data set already risks crossing the limit of 1,000 columns.

Build and Train a Times Series Model

Settings for Time Series Models

As with classification and regression models, you must select your data source and edit your column details. However, there are specific settings used when building and training a time series model that are covered in this lesson.

Predictive Goal

Target: The Target variable is the signal that you want to predict the values for or explain.

Date: The Date variable is mandatory for time series models.

Regardless of the date granularity selected in the time series predictive scenarios with a data set as the data source, every date format must include years, months, and days. Therefore, even if a quarterly or monthly forecast is required, the date format in the data set still needs to include days.

For example, for the YYYY-MM-DD date format, the time series predictive scenarios can be created where the date granularity can be:

  • Year expressed as YYYY-01-01 where YYYY is variable (moving year).
  • Quarter or Month expressed as YYYY-MM-01 where YYYY-MM is variable (moving month).
  • Weekly data in the date format YYYY-MM-DD taking, for instance the first day of the week as the characters DD (moving week).
  • Day (calendar dates) expressed as YYYY-MM-DD where YYYY-MM-DD is variable (moving day).

Number of forecast periods: The number of forecasts to generate. If the input data set contains future values for influencers, the number of forecasts must be less than or equal to the number of future values in the data set. If there are future values for the next six months, the number of forecasts requested cannot exceed six.

The number of forecasts delivered with confidence intervals is determined as follows:

  • If the training data set size is equal to or fewer than 12 periods, it is treated as a small data set case. By default, the number of forecasts with confidence intervals is set to 1.
  • In other cases, the number of forecasts with confidence intervals is set to 1/5 of the training data set size.
  • If the training data set contains 1,000 rows of data, Smart Predict can provide up to 200 forecasts with confidence intervals. If more than 200 forecasts are required, the accuracy of the forecasts starting from the 201st cannot be evaluated.

Entity: Entity is an optional variable that is used to split up the predictive model into segments, with each one producing its own predictive model, with distinct predictions for each segment.

For example, it might be more relevant to have KPIs on both stores and products. If this type of prediction is useful, then click the box, and select the columns for values that are to be used to segment by.

Time series model with Time Series Data Source and Predictive Goals section showing.

Limitations for time series models: If the predictive model is configured for some forecast periods and/or entities beyond the recommended maximum limits, it is likely to create performance issues that can impact other users on the same SAP Analytics Cloud tenant.

  • The maximum number of entities is 1000.
  • The maximum number of forecast periods (independent of the number of entities) is 500.

Predictive Model Training

Train using: Select which observations are used when training the data set for the time series model. You have two options:

  1. All Observations: Train the predictive model using all observations available in the data set. Choose the date of the last observation or define the last date.
  2. Window of Observations: Specify a restricted period of observations. Select the number of days, weeks, months, or years to be included in the observation window. Choose the date of the last observation or define the last date (this date must be available in the data set).

Until: You have two options:

  1. Last Observation: Let the application use the last training reference date as a basis.
  2. User-defined Date: Select a specific date (that is available in the data set).

Exclude As Influencer: The past and future values of the influencer variables must be known (at least for the expected forecast horizon). Select the influencer variables to be excluded when the time series forecast model is trained.

Convert Negative Forecast Values to Zero: Turn negative forecasts to zero. This is useful when negative values are not relevant for the business scenario, for example, number of births. The negative values are forced to take a zero value, having an influence on the error computation and the selection of the best predictive model.

Time series model with the Predictive Model Training section showing.

Build and Train a Time Series Model

Business Scenario

You work for a small business who wants to forecast their daily cash flow over the next 21 working days. They have approximately nine months worth of historic working day cash flow data and have created a number of influencer variables to try to improve the accuracy of the model.

The data that you have been provided is as follows:

Provided Data

VariableDescription
DateDay, month, and year of the readings.
CashCash flow
BeforeLastMonday, LastMonday, BeforeLastTuesday, LastTuesday, BeforeLastWednesday, LastWednesday, BeforeLastThursday, LastThursday, BeforeLastFriday, LastFridayBoolean variables that indicate if the information is true or false
Last5WDays, Last4WDaysBoolean variables that indicate if the date is in the 5 or 4 last working days of the month.
LastWMonth, BeforeLastWMonthBoolean variables that indicate whether the information is true or false.
WorkingDaysIndices, ReverseWorkingDayIndicesIndices or reverse indices of the working days.
MondayMonthInd, TuesdayMonthInd, WednesdayMonthInd, ThursdayMonthInd, FridayMonthInd,Indices of the week days in the month.
Last5WDaysInd, Last4WDAysIndIndices of the five or four last working days of the month.

In this practice exercise, you will:

  1. Create a time series predictive scenario.
  2. Select the data source for the time series model.
  3. Edit the time series data source column details.
  4. Set the predictive goal target, date, and time periods.
  5. Train the model.

Log in to track your progress & complete quizzes