Building a Time Series Model Using SAP Analytics Cloud Smart Predict

Objective

After completing this lesson, you will be able to build a time series model.

Data Sets for a Time Series Model

Input Data Set Recap for Building Time Series Predictive Scenarios

The training data set contains past observations (the history) used to generate the predictive model, along with the data and the times when the observations were recorded.

  • In this data set, the values of the signal variable are known.
  • The data set might also contain some influencer variables.
  • The past and future values of the influencer variables must be known (at least for the expected forecast horizon).

By analyzing the training dataset, Smart Predict generates a time series model.

Times series model with the settings pane open on the right-hand side.

Limitations

The training or application input data set must not contain more than 1,000 columns.

While applying the predictive model to an application dataset, Smart Predict generates additional columns, and the application process can be blocked if the dataset exceeds the 1,000-column limit.

Build and Train a Time Series Model

Settings for Time Series Models

As with classification and regression models, you must select your data source and edit your column details. However, this lesson covers specific settings for building and training a time series model.

Predictive Goal

Target: The Target variable is the signal that you want to predict the values for or explain.

Date: The Date variable is mandatory for time series models.

Regardless of the date granularity selected in the time series predictive scenarios with a dataset as the data source, every date format must include years, months, and days. Therefore, even if a quarterly or monthly forecast is required, the dataset's date format still needs to include days.

For example, for the YYYY-MM-DD date format, the time series predictive scenarios can be created where the date granularity can be:

  • Year expressed as YYYY-01-01, where YYYY is variable (moving year).
  • Quarter or Month expressed as YYYY-MM-01, where YYYY-MM is variable (moving month).
  • Weekly data in the date format YYYY-MM-DD, taking, for instance, the first day of the week as the characters DD (moving week).
  • Day (calendar dates) expressed as YYYY-MM-DD where YYYY-MM-DD is variable (moving day).

Number of forecast periods: The number of forecasts to generate. If the input data set contains future values for influencers, the number of forecasts must be less than or equal to the number of future values in the data set. If there are future values for the next six months, the number of forecasts requested cannot exceed six.

The number of forecasts delivered with confidence intervals is determined as follows:

  • If the training data set size is equal to or fewer than 12 periods, it is treated as a small data set case. By default, the number of forecasts with confidence intervals is set to 1.
  • In other cases, the number of forecasts with confidence intervals is set to 1/5 of the training data set size.
  • If the training data set contains 1,000 rows of data, Smart Predict can provide up to 200 forecasts with confidence intervals. If more than 200 forecasts are required, the accuracy of forecasts beyond the 200th cannot be evaluated.

Entity: Entity is an optional variable that is used to split up the predictive model into segments, with each one producing its own predictive model, with distinct predictions for each segment.

For example, it might be more relevant to have KPIs on both stores and products. If this type of prediction is useful, click the box and select the columns to use for segmentation.

Time series model with Time Series Data Source and Predictive Goals section showing.

Limitations for time series models: If the predictive model is configured for some forecast periods and/or entities beyond the recommended maximum limits, it is likely to create performance issues that can impact other users on the same SAP Analytics Cloud tenant.

  • The maximum number of entities is 1000.
  • The maximum number of forecast periods (independent of the number of entities) is 500.

Predictive Model Training

Train using: Select which observations are used when training the data set for the time series model. You have two options:

  1. All Observations: Train the predictive model using all observations available in the data set. Choose the date of the last observation or define the last date.
  2. Window of Observations: Specify a restricted period of observations. Select the number of days, weeks, months, or years to be included in the observation window. Choose the date of the last observation or define the last date (this date must be available in the data set).

Until: You have two options:

  1. Last Observation: Let the application use the last training reference date as a basis.
  2. User-defined Date: Select a specific date (that is available in the data set).

Exclude As Influencer: The past and future values of the influencer variables must be known (at least for the expected forecast horizon). Select the influencer variables to exclude when training the time series forecast model.

Convert Negative Forecast Values to Zero: Turn negative forecasts to zero. This is useful when negative values are not relevant for the business scenario, for example, the number of births. Negative values are forced to take a zero value, which influences error computation and the selection of the best predictive model.

Time series model with the Predictive Model Training section showing.

Build and Train a Time Series Model

Business Scenario: You work for a small business who wants to forecast their daily cash flow over the next 20 working days. They have twelve months worth of historic working day cash flow data and have created a number of influencer variables to try to improve the accuracy of the model.

The data that you have been provided is as follows:

Provided Data

VariableDescription
DateWorking day date (weekends and public holidays are excluded from the dataset)
CashDaily cash flow amount (target variable for forecasting)
WorkingDaysIndices0-based position of the working day within its month (first working day of the month = 0, second = 1, etc.)
ReverseWorkingDaysIndicesSame as above but counted from the end of the month (last working day = 0, second-to-last = 1, etc.)
MondayMonthInd, TuesdayMonthInd, …, FridayMonthIndWhich occurrence of that weekday it is within the month's working days (e.g., 3 = third working Monday of the month). Value is non-zero only on the corresponding weekday, 0 otherwise.
BeforeLastMonday, BeforeLastTuesday, ..., BeforeLastFridayBinary flag (0/1). Equals 1 only on the second-to-last occurrence of that weekday in the month's working days.
LastMonday, LastTuesday, ..., LastFridayBinary flag (0/1). Equals 1 only on the last occurrence of that weekday in the month's working days.
Last5WDaysIndCounter from 1 to 5 for the last 5 working days of the month (5th-to-last day = 1, 4th-to-last = 2, ..., last day = 5). Zero for all other days.
Last5WDaysBinary flag (0/1). Equals 1 for the last 5 working days of the month.
Last4WDaysIndCounter from 1 to 4 for the last 4 working days of the month (4th-to-last day = 1, ..., last day = 4). Zero for all other days.
Last4WDaysBinary flag (0/1). Equals 1 for the last 4 working days of the month.
LastWMonthBinary flag (0/1). Equals 1 for all working days in the last significant working week of the month (the final Mon-Sun calendar week that contains a Friday or has 4+ working days).
BeforeLastWMonthBinary flag (0/1). Equals 1 for all working days in the second-to-last significant working week of the month (the calendar week immediately before the LastWMonth week).

Task Flow: In this practice exercise, you will:

  1. Create a time series predictive scenario.
  2. Select the data source for the time series model.
  3. Edit the time series data source column details.
  4. Set the predictive goal target, date, and time periods.
  5. Train the model.