Introducing Time Series Models in SAP Analytics Cloud Smart Predict

Objectives

After completing this lesson, you will be able to:

Explain time series analysis in Smart Predict

Use Cases for Time Series Models

Use Augmented Analytics to Control Travel and Expenses

Time series forecasting is useful for estimating future values of a measure where you have a time dimension available to help you identify a trend.

In the case study, we walk you through a scenario for using time series models to control expenses.

What sort of topics can we investigate with a time series model?

You can answer questions, such as:

How will the revenue of a shop evolve over the next month?

What are the expected sales by product per region for the next weeks?

How will the stock of products vary in a warehouse over the following weeks?

How will cash flow evolve during the next quarter?

Time Series Analysis in Smart Predict

What is a time series?

A time series is a series of data points indexed in time order. Normally, a time series is a sequence taken at successive equally spaced points in time. For example, a time series can track the movement of revenue or costs over a specified period of time, with data points recorded at regular intervals. For example, weekly, monthly, quarterly, or annually.

Signals

Historical values of the target variable and the corresponding dates are required when building and training a time series model. This data 'couple' of date and target value is called the signal. The time series forecasting model in Smart Predict analyzes the signal. Values of other variables taken at the same dates (in the past and future) can be included as influencer variables for the model. The variables are used to refine the analysis of the signal.

The signal is the target variable that you want to explain, or predict the values for, and is comprised of several components. If you want to forecast the product sales for the next six months, for example, Product sales is your signal variable. The components include:

The Trend identifies where the times series is headed and in which direction that it generally tends to go. It can be decreasing, increasing, or flat.

The Periodic are the seasonality and period patterns reproduced regularly over time.

The Fluctuation reflects the dependencies of the value of the signal at time "t" on previous values "t-1"… "t-10"… "t-n".

The Residuals are what remains of the signal when trends, periodics, and fluctuations have been removed. Residuals are considered to be white noise - a purely random effect.

Question: What is the name of the regular peaks and troughs shown by the orange line in the diagram below?

Answer: If you said Periodic, you are correct! The regular peaks and troughs shown in the diagram above is an example of a periodic signal.

Note

The time series forecasting model in SAP Analytics Cloud are additive. The forecasts are calculated by summing up the values calculated for the trend, the cycles, and the fluctuations.

Forecast Horizon

The horizon is the number of predictions to be estimated in the future. This number depends directly on the size of the historical data.

5:1 is a good ratio to estimate the horizon and get predictions with relevant confidence intervals. This means that if there are 100 historical cases, then 20 values of the target variable can be predicted in the future. To predict six months ahead, 30 months of historical data must be provided.

Best practice:

It is generally recommended to discard the history that is too far away in time.

Although 20 values or fewer can be chosen, if more are needed then it is better to collect more historical cases.

Using the predictive scenario settings when building your time series model, you can define the time window that is used, either using all available months or restricting to a certain time period.

In SAP Analytics Cloud, the historical data is automatically ordered chronologically and split into two sets:

The first 75% of the data is used to train the time series forecasting model.

The remaining 25% is used to select the best candidate model.

How is the data internally partitioned to optimize the predictive model?

Smart Predict uses the training and validation data sets and performs the following steps when creating a time series model:

From the training data set, several trial versions of the time series model are trained.

The best trial version of the time series model is selected.

The trial version is evaluated using the validation set.

The final predictive time series model is created.

Considerations When Creating a Time Series Forecasting Model

There are a few considerations to take into account when creating your time series forecasting model:

Scale of the predictions: Consider the scale of the predictions. For example, if historical data is captured every month, week, day, hour, or minute, then the predictions will be produced in the same unit of time. Therefore, if data values are recorded every month, it is not meaningful to request predictions for the next few days. If data is recorded every minute by sensors, but the minute is not relevant for the use case, then a higher unit of time such as hour should be used.

Aggregation: Consider the aggregation of data in the unit of time required and define an aggregation function.

For example, an aggregation function could calculate one value for the hour from the 60 values measured for each of the 60 minutes of this hour. It can be the first value, the last value, the mid-value, or a calculated value. For example, the average or a more complex formula.

An important point to keep in mind is the size of the aggregation. A large aggregation may hide information and decrease the quality of the predictions. However, an appropriate aggregation smooths the signal when there is a lot of noise. Test and experiment to choose the best aggregation function.

Sort the data: The historical data set must be cleaned so that each unit of time corresponds to only one value of the target variable. Smart Predict automatically sorts the data.